The fix Roland mentions is included in Lustre 1.4.10 or you can also find it here https://bugzilla.lustre.org/attachment.cgi?id=8709
-therese (HP SFS Support) Postal Address: Hewlett Packard Galway Ltd., Ballybrit Business Park, Galway, Ireland Registered Office: 63-74 Sir John Rogerson's Quay, Dublin 2, Ireland. Registered Number: 361933 -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 02 January 2008 13:23 To: [email protected] Subject: Lustre-discuss Digest, Vol 24, Issue 2 Send Lustre-discuss mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit https://mail.clusterfs.com/mailman/listinfo/lustre-discuss or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than "Re: Contents of Lustre-discuss digest..." Today's Topics: 1. lustre quota problems (Patrick Winnertz) 2. Re: lustre quota problems (Roland Laifer) 3. Re: help needed. (Aaron Knister) ---------------------------------------------------------------------- Message: 1 Date: Wed, 2 Jan 2008 11:27:56 +0100 From: Patrick Winnertz <[EMAIL PROTECTED]> Subject: [Lustre-discuss] lustre quota problems To: Lustre-discuss <[email protected]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="iso-8859-1" Hello, I've several problems with quota on our testcluster: When I set the quota for a person to a given value (e.g. the values which are provided in the operations manual), I'm able to write exact the amount which is set with setquota. But when I delete the files(file) I'm not able to use this space again. Here is what I've done in detail: lfs checkquota -ug /mnt/testfs lfs setquota -u winnie 307200 309200 10000 11000 /mnt/testfs Now I wrote one single big file with dd. dd if=/dev/zero of=/mnt/testfs/test As expected it stops writing the file after it is ~300 MB large. Removing this file and restarting dd leads to a zero-sized file, because the disk quota is exceeded. Does anybody know this behaviour and know what is wrong here? (I guess some values are cached). Thanks in advance! Patrick Winnertz -- Patrick Winnertz Tel.: +49 (0) 2161 / 4643 - 0 credativ GmbH, HRB M?nchengladbach 12080 Hohenzollernstr. 133, 41061 M?nchengladbach Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz ------------------------------ Message: 2 Date: Wed, 2 Jan 2008 11:51:28 +0100 From: Roland Laifer <[EMAIL PROTECTED]> Subject: Re: [Lustre-discuss] lustre quota problems To: Patrick Winnertz <[EMAIL PROTECTED]> Cc: Lustre-discuss <[email protected]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=iso-8859-1 Hello, we had the same problem with our Lustre software from HP (HP SFS). HP opened CFS bug 12431 (which is not visible to the public and for us). Therefore, I'm not sure which Lustre version includes the corresponding fix. HP provided a fix on top of their newest SFS version which fixed the problem. Here is a part of the explanation for the problem: Files which did not decrease the quota when they were deleted had inode->i_dquota set to NULL which should not happen. The root cause was in filter_destroy() and filter_commitrw_commit(). Regards, Roland -- -------------------------------------------------------------------------- Roland Laifer Rechenzentrum, Universitaet Karlsruhe (TH), D-76128 Karlsruhe, Germany Email: [EMAIL PROTECTED], Phone: +49 721 608 4861, Fax: +49 721 32550, Web: www.rz.uni-karlsruhe.de/personen/roland.laifer -------------------------------------------------------------------------- On Wed, Jan 02, 2008 at 11:27:56AM +0100, Patrick Winnertz wrote: > Hello, > > I've several problems with quota on our testcluster: > > When I set the quota for a person to a given value (e.g. the values > which are provided in the operations manual), I'm able to write exact > the amount which is set with setquota. But when I delete the > files(file) I'm not able to use this space again. > > Here is what I've done in detail: > lfs checkquota -ug /mnt/testfs > lfs setquota -u winnie 307200 309200 10000 11000 /mnt/testfs > > Now I wrote one single big file with dd. > dd if=/dev/zero of=/mnt/testfs/test > > As expected it stops writing the file after it is ~300 MB large. > Removing this file and restarting dd leads to a zero-sized file, > because the disk quota is exceeded. > > Does anybody know this behaviour and know what is wrong here? (I guess > some values are cached). > > Thanks in advance! > Patrick Winnertz > > -- > Patrick Winnertz > Tel.: +49 (0) 2161 / 4643 - 0 > > credativ GmbH, HRB M?nchengladbach 12080 > Hohenzollernstr. 133, 41061 M?nchengladbach > Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss ------------------------------ Message: 3 Date: Wed, 2 Jan 2008 08:22:38 -0500 From: Aaron Knister <[EMAIL PROTECTED]> Subject: Re: [Lustre-discuss] help needed. To: Avi Gershon <[EMAIL PROTECTED]> Cc: Yan Benhammou <[EMAIL PROTECTED]>, [email protected], Meny Ben moshe <[EMAIL PROTECTED]> Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="us-ascii" On the host x-math20 could you run an "lctl list_nids" and also an "ifconfig -a". I want to see if lnet is listening on the correct interface. Oh could you also post the contents of your /etc/ modprobe.conf. Thanks! -Aaron On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote: > Hello to every one and happy new year.. > I think I have reduce my problem to this: lctl ping > [EMAIL PROTECTED] don't work for me for some strange reason as you > can see: > ********************************************************************** > ************* > [EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED] > failed to ping [EMAIL PROTECTED]: Input/output error > [EMAIL PROTECTED] ~]# ping 132.66.176.211 > PING 132.66.176.211 (132.66.176.211) 56(84) bytes of data. > 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms > 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms > 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m > --- 132.66.176.211 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2018ms > rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2 > [EMAIL PROTECTED] ~]# > ***************************************************************************************** > > > On 12/24/07, Avi Gershon <[EMAIL PROTECTED]> wrote: > Hi, > here is the "iptables -L " results: > > NODE 1 132.66.176.212 > Scientific Linux CERN SLC release 4.6 (Beryllium) > [EMAIL PROTECTED]'s password: Last login: Sun Dec 23 22:01:18 2007 > from x-fishelov.tau.ac.il [EMAIL PROTECTED] ~]# > [EMAIL PROTECTED] ~]# > [EMAIL PROTECTED] ~]# iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > ********************************************************************** > ************************** > MDT 132.66.176.211 > > Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il > [EMAIL PROTECTED] ~]# iptables -L Chain INPUT (policy ACCEPT) > target prot opt source destination > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > ********************************************************************** > *** > > NODE 2 132.66.176.215 > Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il > [EMAIL PROTECTED] ~]# iptables -L > > Chain INPUT (policy ACCEPT) > target prot opt source destination > RH-Firewall-1-INPUT all -- anywhere anywhere > Chain FORWARD (policy ACCEPT) > target prot opt source destination > RH-Firewall-1-INPUT all -- anywhere anywhere > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > Chain RH-Firewall-1-INPUT (2 references) > target prot opt source destination > ACCEPT all -- anywhere anywhere > ACCEPT icmp -- anywhere anywhere icmp any > ACCEPT ipv6-crypt-- anywhere anywhere > ACCEPT ipv6-auth-- anywhere anywhere > ACCEPT udp -- anywhere 224.0.0.251 udp dpt: > 5353 > ACCEPT udp -- anywhere anywhere udp > dpt:ipp > ACCEPT all -- anywhere anywhere state > RELATED,ESTAB > LISHED > ACCEPT tcp -- anywhere anywhere state > NEW tcp dpts: > 30000:30101 > ACCEPT tcp -- anywhere anywhere state > NEW tcp dpt:s > sh > ACCEPT udp -- anywhere anywhere state > NEW udp dpt:a > fs3-callback > REJECT all -- anywhere anywhere reject- > with icmp-ho > st-prohibited > [EMAIL PROTECTED] ~]# > > ************************************************************ > one more thing.... > Do you use TCP protocol? or do you use UDP? > > Regards Avi, > P.S I think a beginning of a beautiful friendship.. :-) > > > > On Dec 24, 2007 5:29 PM, Aaron Knister <[EMAIL PROTECTED]> wrote: That > sounds like quite a task! Could you show me the contents of your > firewall rules on the systems mentioned below? (iptables -L) on each. > That would help to diagnose the problem further. > > -Aaron > > On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote: > > > Hi Aaron and thank you for you fast answwers. > > We are working (Avi,Meny and me) on the israeli GRID and we need to > > create a single huge file system for this GRID. > > cheers > > Yan > > > > ________________________________ > > > > From: Aaron Knister [mailto:[EMAIL PROTECTED] > > Sent: Sun 12/23/2007 8:27 PM > > To: Avi Gershon > > Cc: [email protected]; Yan Benhammou; Meny Ben moshe > > Subject: Re: [Lustre-discuss] help needed. > > > > > > Can you check the firewall on each of those machines ( iptables -L ) > > and paste that here. Also, is this network dedicated to Lustre? > > Lustre can easily saturate a network interface under load to the > > point it becomes difficult to login to a node if it only has one > > interface. I'd recommend using a different interface if you can. > > > > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote: > > > > > > node 1 132.66.176.212 < http://132.66.176.212/> > > node 2 132.66.176.215 < http://132.66.176.215/> > > > > [EMAIL PROTECTED] ~]# ssh 132.66.176.215 < http:// > 132.66.176.215/ > > > [EMAIL PROTECTED]'s password: > > ssh(21957) Permission denied, please try again. > > [EMAIL PROTECTED] 's password: > > Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il > <http://x-math20.tau.ac.il/ > > > > > [EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED] > > failed to ping [EMAIL PROTECTED]: Input/output error > > [EMAIL PROTECTED] ~]# lctl list_nids > > [EMAIL PROTECTED] > > [EMAIL PROTECTED] ~]# ssh 132.66.176.212 <http:// > 132.66.176.212/> > > The authenticity of host ' 132.66.176.212 > > <http://132.66.176.212/ > > > > ( 132.66.176.212 <http://132.66.176.212/> )' can't be established. > > RSA1 key fingerprint is > 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce: > > 7e:74. > > Are you sure you want to continue connecting (yes/no)? yes > > ssh(11526) Warning: Permanently added ' 132.66.176.212 < > > http://132.66.176.212/ > > > ' (RSA1) to the list of kno > > wn hosts. > > [EMAIL PROTECTED]'s password: > > Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il > <http://x-math20.tau.ac.il/ > > > > > [EMAIL PROTECTED] ~]# lctl ping [EMAIL PROTECTED] > > failed to ping [EMAIL PROTECTED]: Input/output error > > [EMAIL PROTECTED] ~]# lctl list_nids > > [EMAIL PROTECTED] > > [EMAIL PROTECTED] ~]# > > > > > > thanks for helping!! > > Avi > > > > > > On Dec 23, 2007 5:32 PM, Aaron Knister < [EMAIL PROTECTED]> > wrote: > > > > > > On the oss can you ping the mds/mgs using this > command-- > > > > lctl ping [EMAIL PROTECTED] > > > > If it doesn't ping, list the nids on each node by > running > > > > lctl list_nids > > > > and tell me what comes back. > > > > -Aaron > > > > > > On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote: > > > > > > HI I could use some help. > > I installed lustre on 3 computers > > mdt/mgs : > > > > > > > ********************************************************************** > **************8 > > [EMAIL PROTECTED] ~]#mkfs.lustre --reformat -- > fsname spfs --mdt -- > > mgs /dev/hdb > > > > Permanent disk data: > > Target: spfs-MDTffff > > Index: unassigned > > Lustre FS: spfs > > Mount type: ldiskfs > > Flags: 0x75 > > (MDT MGS needs_index > first_time update ) > > Persistent mount opts: errors=remount- > ro,iopen_nopriv,user_xattr > > Parameters: > > > > device size = 19092MB > > formatting backing filesystem ldiskfs on / > dev/hdb > > target name spfs-MDTffff > > 4k blocks 0 > > options -J size=400 -i 4096 - > I 512 -q -O dir_index > > -F > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- > MDTffff -J size=400 -i > > 4096 -I 512 -q -O dir_index -F /dev/hdb > > Writing CONFIGS/mountdata > > [ [EMAIL PROTECTED] ~]# df > > Filesystem 1K-blocks Used > Available Use% Mounted on > > /dev/hda1 19228276 4855244 > 13396284 27% / > > none 127432 0 > 127432 0% /dev/shm > > /dev/hdb 17105436 455152 > 15672728 3% /mnt/test/ > > mdt > > [EMAIL PROTECTED] ~]# cat /proc/fs/lustre/devices > > 0 UP mgs MGS MGS 5 > > 1 UP mgc [EMAIL PROTECTED] > > 5f5ba729-6412-3843-2229-1310a0b48f71 5 > > 2 UP mdt MDS MDS_uuid 3 > > 3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4 > > 4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3 > > [ [EMAIL PROTECTED] ~]# > > > *************************************************************end > > mdt******************************8 > > so you can see that the MGS is up > > ond on the ost's I get an error!! plz help... > > > > ost: > > > > > ********************************************************************** > > [ [EMAIL PROTECTED] ~]# mkfs.lustre --reformat > --fsname spfs --ost -- > > mgsnode=132.66. [EMAIL PROTECTED] /dev/hdb1 > > > > Permanent disk data: > > Target: spfs-OSTffff > > Index: unassigned > > Lustre FS: spfs > > Mount type: ldiskfs > > Flags: 0x72 > > (OST needs_index first_time > update ) > > Persistent mount opts: errors=remount- > ro,extents,mballoc > > Parameters: [EMAIL PROTECTED] > > > > device size = 19594MB > > formatting backing filesystem ldiskfs on / > dev/hdb1 > > target name spfs-OSTffff > > 4k blocks 0 > > options -J size=400 -i 16384 - > I 256 -q -O > > dir_index -F > > mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs- > OSTffff -J size=400 -i > > 16384 -I 256 -q -O dir_index -F /dev/hdb1 > > Writing CONFIGS/mountdata > > [ [EMAIL PROTECTED] ~]# /CONFIGS/mountdata > > -bash: /CONFIGS/mountdata: No such file or > directory > > [EMAIL PROTECTED] ~]# mount -t lustre /dev/ > hdb1 /mnt/test/ost1 > > mount.lustre: mount /dev/hdb1 at /mnt/test/ > ost1 failed: Input/ > > output error > > Is the MGS running? > > > ***********************************************end > > ost******************************** > > > > can any one point out the problem? > > thanks Avi. > > > > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > > > > Aaron Knister > > Associate Systems Administrator/Web Designer > > Center for Research on Environment and Water > > > > (301) 595-7001 > > [EMAIL PROTECTED] > > > > > > > > > > > > > > Aaron Knister > > Associate Systems Administrator/Web Designer > > Center for Research on Environment and Water > > > > (301) 595-7001 > > [EMAIL PROTECTED] > > > > > > > > Aaron Knister > Associate Systems Administrator/Web Designer > Center for Research on Environment and Water > > (301) 595-7001 > [EMAIL PROTECTED] > > > > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 [EMAIL PROTECTED] -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20080102/636e7553/attachment.html ------------------------------ _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss End of Lustre-discuss Digest, Vol 24, Issue 2 ********************************************* _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
