Re: [Lustre-discuss] lustre quota problems

McHale, Therese Wed, 02 Jan 2008 05:40:29 -0800

The fix Roland mentions is included in Lustre 1.4.10 or you can also find it 
here https://bugzilla.lustre.org/attachment.cgi?id=8709


-therese

(HP SFS Support)

Postal Address: Hewlett Packard Galway Ltd., Ballybrit Business Park, Galway, 
Ireland
Registered Office: 63-74 Sir John Rogerson's Quay, Dublin 2, Ireland.
Registered Number: 361933

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: 02 January 2008 13:23
To: [email protected]
Subject: Lustre-discuss Digest, Vol 24, Issue 2


Send Lustre-discuss mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
or, via email, send a message with subject or body 'help' to
        [EMAIL PROTECTED]

You can reach the person managing the list at
        [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of Lustre-discuss digest..."


Today's Topics:

   1. lustre quota problems (Patrick Winnertz)
   2. Re: lustre quota problems (Roland Laifer)
   3. Re: help needed. (Aaron Knister)


----------------------------------------------------------------------

Message: 1
Date: Wed, 2 Jan 2008 11:27:56 +0100
From: Patrick Winnertz <[EMAIL PROTECTED]>
Subject: [Lustre-discuss] lustre quota problems
To: Lustre-discuss <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain;  charset="iso-8859-1"

Hello,

I've several problems with quota on our testcluster:

When I set the quota for a person to a given value (e.g. the values which are 
provided in the operations manual), I'm able to write exact the amount which is 
set with setquota. But when I delete the files(file) I'm not able to use this 
space again.

Here is what I've done in detail:
lfs checkquota -ug /mnt/testfs
lfs setquota -u winnie 307200 309200 10000 11000 /mnt/testfs

Now I wrote one single big file with dd.
dd if=/dev/zero of=/mnt/testfs/test

As expected it stops writing the file after it is ~300 MB large. Removing this 
file and restarting dd leads to a zero-sized file, because the disk quota is 
exceeded.

Does anybody know this behaviour and know what is wrong here? (I guess some 
values are cached).

Thanks in advance!
Patrick Winnertz

--
Patrick Winnertz
Tel.: +49 (0) 2161 / 4643 - 0

credativ GmbH, HRB M?nchengladbach 12080
Hohenzollernstr. 133, 41061 M?nchengladbach
Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz



------------------------------

Message: 2
Date: Wed, 2 Jan 2008 11:51:28 +0100
From: Roland Laifer <[EMAIL PROTECTED]>
Subject: Re: [Lustre-discuss] lustre quota problems
To: Patrick Winnertz <[EMAIL PROTECTED]>
Cc: Lustre-discuss <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=iso-8859-1

Hello,

we had the same problem with our Lustre software from HP (HP SFS). HP opened 
CFS bug 12431 (which is not visible to the public and for us). Therefore, I'm 
not sure which Lustre version includes the corresponding fix. HP provided a fix 
on top of their newest SFS version which fixed the problem.

Here is a part of the explanation for the problem:
Files which did not decrease the quota when they were deleted had
inode->i_dquota set to NULL which should not happen. The root cause
was in filter_destroy() and filter_commitrw_commit().

Regards,
  Roland
--
 --------------------------------------------------------------------------
  Roland Laifer
  Rechenzentrum, Universitaet Karlsruhe (TH), D-76128 Karlsruhe, Germany
  Email: [EMAIL PROTECTED], Phone: +49 721 608 4861,
  Fax: +49 721 32550, Web: www.rz.uni-karlsruhe.de/personen/roland.laifer
 --------------------------------------------------------------------------

On Wed, Jan 02, 2008 at 11:27:56AM +0100, Patrick Winnertz wrote:
> Hello,
>
> I've several problems with quota on our testcluster:
>
> When I set the quota for a person to a given value (e.g. the values
> which are provided in the operations manual), I'm able to write exact
> the amount which is set with setquota. But when I delete the
> files(file) I'm not able to use this space again.
>
> Here is what I've done in detail:
> lfs checkquota -ug /mnt/testfs
> lfs setquota -u winnie 307200 309200 10000 11000 /mnt/testfs
>
> Now I wrote one single big file with dd.
> dd if=/dev/zero of=/mnt/testfs/test
>
> As expected it stops writing the file after it is ~300 MB large.
> Removing this file and restarting dd leads to a zero-sized file,
> because the disk quota is exceeded.
>
> Does anybody know this behaviour and know what is wrong here? (I guess
> some values are cached).
>
> Thanks in advance!
> Patrick Winnertz
>
> --
> Patrick Winnertz
> Tel.: +49 (0) 2161 / 4643 - 0
>
> credativ GmbH, HRB M?nchengladbach 12080
> Hohenzollernstr. 133, 41061 M?nchengladbach
> Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz
>
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss



------------------------------

Message: 3
Date: Wed, 2 Jan 2008 08:22:38 -0500
From: Aaron Knister <[EMAIL PROTECTED]>
Subject: Re: [Lustre-discuss] help needed.
To: Avi Gershon <[EMAIL PROTECTED]>
Cc: Yan Benhammou <[EMAIL PROTECTED]>,
        [email protected],   Meny Ben moshe <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset="us-ascii"

On the host x-math20 could you run an "lctl list_nids" and also an "ifconfig 
-a". I want to see if lnet is listening on the correct interface. Oh could you 
also post the contents of your /etc/ modprobe.conf.

Thanks!

-Aaron

On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote:

> Hello to every one and happy new year..
> I think I have reduce my problem to this: lctl ping
> [EMAIL PROTECTED] don't work for me for some strange reason as you
> can see:
> **********************************************************************
> *************
> [EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED]
> failed to ping [EMAIL PROTECTED]: Input/output error
> [EMAIL PROTECTED] ~]# ping 132.66.176.211
> PING 132.66.176.211 (132.66.176.211) 56(84) bytes of data.
> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms
> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms
> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m
> --- 132.66.176.211 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms
> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2
> [EMAIL PROTECTED] ~]#
> *****************************************************************************************
>
>
> On 12/24/07, Avi Gershon <[EMAIL PROTECTED]> wrote:
> Hi,
> here is the "iptables -L  " results:
>
>  NODE 1 132.66.176.212
> Scientific Linux CERN SLC release 4.6 (Beryllium)
> [EMAIL PROTECTED]'s password: Last login: Sun Dec 23 22:01:18 2007
> from x-fishelov.tau.ac.il [EMAIL PROTECTED] ~]#
> [EMAIL PROTECTED] ~]#
> [EMAIL PROTECTED] ~]# iptables -L
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> **********************************************************************
> **************************
>  MDT 132.66.176.211
>
> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il
> [EMAIL PROTECTED] ~]# iptables -L Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> **********************************************************************
> ***
>
> NODE 2 132.66.176.215
> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il
> [EMAIL PROTECTED] ~]# iptables -L
>
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
>
> Chain RH-Firewall-1-INPUT (2 references)
> target     prot opt source               destination
> ACCEPT     all  --  anywhere             anywhere
> ACCEPT     icmp --  anywhere             anywhere            icmp any
> ACCEPT     ipv6-crypt--  anywhere             anywhere
> ACCEPT     ipv6-auth--  anywhere             anywhere
> ACCEPT     udp  --  anywhere             224.0.0.251         udp dpt:
> 5353
> ACCEPT     udp  --  anywhere             anywhere            udp
> dpt:ipp
> ACCEPT     all  --  anywhere             anywhere            state
> RELATED,ESTAB
> LISHED
> ACCEPT     tcp  --  anywhere             anywhere            state
> NEW tcp dpts:
> 30000:30101
> ACCEPT     tcp  --  anywhere             anywhere            state
> NEW tcp dpt:s
> sh
> ACCEPT     udp  --  anywhere             anywhere            state
> NEW udp dpt:a
> fs3-callback
> REJECT     all  --  anywhere             anywhere            reject-
> with icmp-ho
> st-prohibited
> [EMAIL PROTECTED] ~]#
>
> ************************************************************
> one more thing....
> Do you use TCP protocol? or do you use UDP?
>
> Regards Avi,
> P.S I think a beginning of a beautiful friendship.. :-)
>
>
>
> On Dec 24, 2007 5:29 PM, Aaron Knister <[EMAIL PROTECTED]> wrote: That
> sounds like quite a task! Could you show me the contents of your
> firewall rules on the systems mentioned below? (iptables -L) on each.
> That would help to diagnose the problem further.
>
> -Aaron
>
> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote:
>
> > Hi Aaron and thank you for you fast answwers.
> > We are working (Avi,Meny and me) on the israeli GRID and we need to
> > create a single huge file system for this GRID.
> >     cheers
> >          Yan
> >
> > ________________________________
> >
> > From: Aaron Knister [mailto:[EMAIL PROTECTED]
> > Sent: Sun 12/23/2007 8:27 PM
> > To: Avi Gershon
> > Cc: [email protected]; Yan Benhammou; Meny Ben moshe
> > Subject: Re: [Lustre-discuss] help needed.
> >
> >
> > Can you check the firewall on each of those machines ( iptables -L )
> > and paste that here. Also, is this network dedicated to Lustre?
> > Lustre can easily saturate a network interface under load to the
> > point it becomes difficult to login to a node if it only has one
> > interface. I'd recommend using a different interface if you can.
> >
> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote:
> >
> >
> >       node 1 132.66.176.212 < http://132.66.176.212/>
> >       node 2 132.66.176.215 < http://132.66.176.215/>
> >
> >       [EMAIL PROTECTED] ~]# ssh 132.66.176.215 < http://
> 132.66.176.215/ >
> >       [EMAIL PROTECTED]'s password:
> >       ssh(21957) Permission denied, please try again.
> >       [EMAIL PROTECTED] 's password:
> >       Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il
> <http://x-math20.tau.ac.il/
> > >
> >       [EMAIL PROTECTED] ~]#  lctl ping [EMAIL PROTECTED]
> >       failed to ping [EMAIL PROTECTED]: Input/output error
> >       [EMAIL PROTECTED] ~]#  lctl list_nids
> >       [EMAIL PROTECTED]
> >       [EMAIL PROTECTED] ~]# ssh 132.66.176.212 <http://
> 132.66.176.212/>
> >       The authenticity of host ' 132.66.176.212
> > <http://132.66.176.212/
> >
> > ( 132.66.176.212 <http://132.66.176.212/> )' can't be established.
> >       RSA1 key fingerprint is
> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:
> > 7e:74.
> >       Are you sure you want to continue connecting (yes/no)? yes
> >       ssh(11526) Warning: Permanently added ' 132.66.176.212 <
> > http://132.66.176.212/
> > > ' (RSA1) to the list of kno
> >       wn hosts.
> >       [EMAIL PROTECTED]'s password:
> >       Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il
> <http://x-math20.tau.ac.il/
> > >
> >       [EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED]
> >       failed to ping [EMAIL PROTECTED]: Input/output error
> >       [EMAIL PROTECTED] ~]# lctl list_nids
> >       [EMAIL PROTECTED]
> >       [EMAIL PROTECTED] ~]#
> >
> >
> >       thanks for helping!!
> >       Avi
> >
> >
> >       On Dec 23, 2007 5:32 PM, Aaron Knister < [EMAIL PROTECTED]>
> wrote:
> >
> >
> >               On the oss can you ping the mds/mgs using this
> command--
> >
> >               lctl ping [EMAIL PROTECTED]
> >
> >               If it doesn't ping, list the nids on each node by
> running
> >
> >               lctl list_nids
> >
> >               and tell me what comes back.
> >
> >               -Aaron
> >
> >
> >               On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote:
> >
> >
> >                       HI I could use some help.
> >                       I installed lustre on 3 computers
> >                        mdt/mgs :
> >
> >
> >
> **********************************************************************
> **************8
> >                       [EMAIL PROTECTED] ~]#mkfs.lustre --reformat --
> fsname spfs --mdt --
> > mgs /dev/hdb
> >
> >                          Permanent disk data:
> >                       Target:     spfs-MDTffff
> >                       Index:      unassigned
> >                       Lustre FS:  spfs
> >                       Mount type: ldiskfs
> >                       Flags:      0x75
> >                                     (MDT MGS needs_index
> first_time update )
> >                       Persistent mount opts: errors=remount-
> ro,iopen_nopriv,user_xattr
> >                       Parameters:
> >
> >                       device size = 19092MB
> >                       formatting backing filesystem ldiskfs on /
> dev/hdb
> >                               target name  spfs-MDTffff
> >                               4k blocks     0
> >                               options        -J size=400 -i 4096 -
> I 512 -q -O dir_index
> > -F
> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-
> MDTffff  -J size=400 -i
> > 4096 -I 512 -q -O dir_index -F /dev/hdb
> >                       Writing CONFIGS/mountdata
> >                       [ [EMAIL PROTECTED] ~]# df
> >                       Filesystem           1K-blocks      Used
> Available Use% Mounted on
> >                       /dev/hda1             19228276   4855244
> 13396284  27% /
> >                       none                    127432         0
> 127432   0% /dev/shm
> >                       /dev/hdb              17105436    455152
> 15672728   3% /mnt/test/
> > mdt
> >                       [EMAIL PROTECTED] ~]# cat /proc/fs/lustre/devices
> >                         0 UP mgs MGS MGS 5
> >                         1 UP mgc [EMAIL PROTECTED]
> > 5f5ba729-6412-3843-2229-1310a0b48f71 5
> >                         2 UP mdt MDS MDS_uuid 3
> >                         3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
> >                         4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3
> >                       [ [EMAIL PROTECTED] ~]#
> >
> *************************************************************end
> > mdt******************************8
> >                       so you can see that the MGS is up
> >                       ond on the ost's I get an error!! plz help...
> >
> >                       ost:
> >
> >
> **********************************************************************
> >                       [ [EMAIL PROTECTED] ~]# mkfs.lustre --reformat
> --fsname spfs --ost --
> > mgsnode=132.66. [EMAIL PROTECTED] /dev/hdb1
> >
> >                          Permanent disk data:
> >                       Target:     spfs-OSTffff
> >                       Index:      unassigned
> >                       Lustre FS:  spfs
> >                       Mount type: ldiskfs
> >                       Flags:      0x72
> >                                     (OST needs_index first_time
> update )
> >                       Persistent mount opts: errors=remount-
> ro,extents,mballoc
> >                       Parameters: [EMAIL PROTECTED]
> >
> >                       device size = 19594MB
> >                       formatting backing filesystem ldiskfs on /
> dev/hdb1
> >                               target name  spfs-OSTffff
> >                               4k blocks     0
> >                               options        -J size=400 -i 16384 -
> I 256 -q -O
> > dir_index -F
> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-
> OSTffff  -J size=400 -i
> > 16384 -I 256 -q -O dir_index -F /dev/hdb1
> >                       Writing CONFIGS/mountdata
> >                       [ [EMAIL PROTECTED] ~]# /CONFIGS/mountdata
> >                       -bash: /CONFIGS/mountdata: No such file or
> directory
> >                       [EMAIL PROTECTED] ~]# mount -t lustre /dev/
> hdb1 /mnt/test/ost1
> >                       mount.lustre: mount /dev/hdb1 at /mnt/test/
> ost1 failed: Input/
> > output error
> >                       Is the MGS running?
> >
> ***********************************************end
> > ost********************************
> >
> >                       can any one point out the problem?
> >                       thanks Avi.
> >
> >
> >
> >
> _______________________________________________
> >                       Lustre-discuss mailing list
> >                       [email protected]
> >
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >
> >
> >
> >
> >
> >               Aaron Knister
> >               Associate Systems Administrator/Web Designer
> >               Center for Research on Environment and Water
> >
> >               (301) 595-7001
> >               [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> >
> > Aaron Knister
> > Associate Systems Administrator/Web Designer
> > Center for Research on Environment and Water
> >
> > (301) 595-7001
> > [EMAIL PROTECTED]
> >
> >
> >
>
> Aaron Knister
> Associate Systems Administrator/Web Designer
> Center for Research on Environment and Water
>
> (301) 595-7001
> [EMAIL PROTECTED]
>
>
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
[EMAIL PROTECTED]




-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20080102/636e7553/attachment.html

------------------------------

_______________________________________________
Lustre-discuss mailing list
[email protected] 
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss


End of Lustre-discuss Digest, Vol 24, Issue 2
*********************************************

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] lustre quota problems

Reply via email to