Re: [Lustre-discuss] 0 byte files and ll_readdir() error

Daniel Leaberry Wed, 25 Apr 2007 07:38:54 -0700

On Apr 24, 2007  08:42 -0600, Daniel Leaberry wrote:

We're running 1.6b7 and have noticed the following two problems. I'mwondering if they're correlated.
1. We get files that are 0 bytes. They have nothing in them.


This may or may not be related to the recent bug 12181 problem.
That bug will be fixed in 1.6.0+ and 1.4.10.1 and 1.4.11+.

It can also happen if the clients are evicted while they are
writing to the file.

I figured out why this happened but I'm not sure if my explanation isvalid. We run lustre as more of a general purpose filesystem but usuallywith larger size files. We use autofs to mount and unmount filesystems.The timeout is set to 120 seconds (after that much inactivity thefilesystem is unmounted)

On a particular machine that was being accessed infrequently and withsmall files what I think happened is a batch of xml files would bewritten, the metadata would be created on the MDS (hence the zero-bytefiles), but because lustre is trying to optimize the rpcs for 1MB io'sand the client is doing caching the data wouldn't be written to theOST's. Then autofs would unmount the filesystem without flushing thewrite buffers (That doesn't make sense) and a few minutes later I wouldget a client evicted message on the MDS. Since the client was evictedall caches are flushed and the data was lost.

I'm not sure why autofs unmounting the filesystem wouldn't flush thebuffers and I'm also not sure why unmounting doesn't seem to inform theMDS that the client is leaving. I know lustre probably isn't expectingto be mounted and unmounted every 5 minutes but is this expected behavior?

2. We get these errors across our 30 nodes
LustreError: 7030:0:(dir.c:330:ll_readdir()) error reading dir167108765/2378987153 page 13: rc -5LustreError: 7029:0:(dir.c:330:ll_readdir()) error reading dir171699532/2388399554 page 9: rc -5LustreError: 7027:0:(dir.c:330:ll_readdir()) error reading dir171403580/2387428410 page 2: rc -5LustreError: 6990:0:(dir.c:330:ll_readdir()) error reading dir171011300/2386583645 page 8: rc -5LustreError: 7027:0:(dir.c:330:ll_readdir()) error reading dir172286916/2390172901 page 13: rc -5LustreError: 6990:0:(dir.c:330:ll_readdir()) error reading dir172030180/2388919021 page 13: rc -5LustreError: 7027:0:(dir.c:330:ll_readdir()) error reading dir172321971/2390308492 page 3: rc -5LustreError: 7027:0:(dir.c:330:ll_readdir()) error reading dir163603484/1208913504 page 8: rc -5LustreError: 6990:0:(dir.c:330:ll_readdir()) error reading dir172748079/2390802528 page 13: rc -5LustreError: 9133:0:(dir.c:330:ll_readdir()) error reading dir172818070/2390892206 page 2: rc -5LustreError: 9171:0:(dir.c:330:ll_readdir()) error reading dir168359805/2380837293 page 8: rc -5LustreError: 9187:0:(dir.c:330:ll_readdir()) error reading dir163706128/1209056171 page 7: rc -5LustreError: 9199:0:(dir.c:330:ll_readdir()) error reading dir165116087/1211142674 page 0: rc -5LustreError: 9217:0:(dir.c:330:ll_readdir()) error reading dir162005170/1206582728 page 12: rc -5LustreError: 9216:0:(dir.c:330:ll_readdir()) error reading dir162686166/1207618778 page 12: rc -5LustreError: 6990:0:(dir.c:330:ll_readdir()) error reading dir163079284/1208141145 page 3: rc -5
These are reporting IO errors while reading directories from the MDS.
This isn't a problem I've seen before, it's hard to say what is the
root cause.

Is it possible the clients are just messed up? Especially since I get noerrors on the MDS? I suppose this might be due to our autofsmount/umounting so many times.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] 0 byte files and ll_readdir() error

Reply via email to