>>> On Mon, 10 Jul 2006 13:34:05 -0700, Alex Lisker >>> <[EMAIL PROTECTED]> said:
[ ... ] lisker> ERROR: (device sdc): diRead: i_ino != di_number That is in 'fs/jfs/jfs_imap.c:diRead()' and means that when reading an inode from a disk block its number is different from the number expected. This can be either because a nonsensical number has been requested or more likely because the on disk page is wrong or has been corrupted. There is a 'jfs_info()' call in that function that prints the value of the requested number, perhaps you could enable debugging in your JFS. lisker> [ ... ] EonStore's RAID5 [ ... ] As to this, I would mention http://WWW.BAARF.com/ as in your situation RAID5 seems particularly inappropriate (not that it is ever much appropriate). lisker> [ ... ] logical drives of 1.6Tb in size each. [ ... ] lisker> Also, fsck And what does 'fsck' report? If the on-disk structures are corrupted it should complain. lisker> of this partition takes 7 hours, whereas the other two lisker> partitions take about 2 hours each to fsck. Ah very interesting numbers, first time I have seen them for largish JFS partitions. Pretty amazingly fast even at 7 hours. BTW, I am assuming here that "b" means ''byte'' and not ''bit''. lisker> `du -sh' took more than 24 hours, and produced a total lisker> of 660Gb! 24 hours for 'du -sh' when 'fsck' only takes 7 hours suggests indeed very many very scattered small files... lisker> Seems to me that the problem occurs when he's trying to lisker> manage files. For example, today he tried to delete a lisker> large directory right before the problem occurred. Uhm, depends on *how many* small files he has got. There is a limit to how many entries a directory can have or inodes a filesystem can have even in JFS. I wrote a list of limits in my FS page here: http://WWW.sabi.co.UK/Notes/linuxFS.html#fsFeats and they are for JFS more or less 2/4 billion files per directory or filesystem, and 65 thousand subdirectories per directory. 660GB in many small files? How many files? Let's say each takes one 4KiB block, that is almost 160 million files, which is not a lot compared to the 2/4 billion limit. But what is their organization? How many subdirectories? Perhaps the 65,000 thousand limit is being silently exceeded. I wonder if there is any check of that. Also, let's see, 160m 4KiB files, how long are the names of those files? JFS uses something like VSAM, so it is pretty efficient with indices IIRC. However, for comparison, my '/usr/bin' directory: # ls -ld /usr/bin drwxr-xr-x 2 root root 221184 Jul 10 06:06 /usr/bin # cd /usr/bin && ls | wc -cl 3823 37423 It has 3,800 entries, each with a name on average 10 characters long, and the directory takes about 220KB. If we extrapolate that linearly to 160m entries, that would imply a directory size of roughly 9-10GB, not that big after all :-). [ ... ] lisker> Perhaps, smaller partitions? [ ... ] Seems rather pointless to me. lisker> Also, what is the best possible strategy to stress-test lisker> my hardware and a file system as close to my environment lisker> as possible: Whatever your users are doing seems to be fine for a stress test... :-) lisker> [ ... ] The system is under certain amount of stress lisker> 24/7, it's being used to host user's homes. The lisker> problematic jfs partition is a home of one particularly lisker> notorious user which has managed to generate a literally lisker> _enormous_ amount of tiny files. [ ... ] This may be an issue as it may exceed the range of the limits within which the system (and not just JFS) has been tested so far. I wonder how many people have tested Linux (the kernel) and its various drivers and options with filesystems like yours. Not many I suppose. lisker> [ ... ] large partitions under constant IO stress lisker> occupied by a several dozen of users' homes [ ... ] This does not seem a big deal to me. Quite a few people do that. lisker> made available via NFS? Ahhhhhhhh, the big important detail is last. Especially if you are using kernel based NFS, my impression is that it is not that robust under stress. Anything could be happening (stray pointers corrupting in memory data). Note also that NFS, depending on the protocol version, may have much smaller limits than JFS as to number of entities supported. Also, you may want to try using Samba and CIFS even for Linux-Linux file service. However the big thing is to try if the same problems happens if you do operations on the big subtree locally. It may be that you are using JFS in circumstances in which it has not been used before, and thus untested, but the same may be true of NFS (or the kernel itself). ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Jfs-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/jfs-discussion
