All,

A quick update on this issue... Our application creates 1 new top level 
directory each day and after about 500 days *all* of the servers I've checked 
have corrupt root nodes. Even more troubling, after we repair a volume by 
running jfs_jfsck and recovering data from lost+found (see below), the problem 
re-occurs after about a month of creating new directories. However, if no new 
top level directories are created, and only changes lower down in the hierarchy 
are made, the problem does not reoccur.

Does anyone have any theories about what is going on here? Is there anything we 
can do to prevent this from happening? Would moving all the data down one level 
(e.g. nested in a single root directory) help or is the root node like any 
other node and 500+ nested directories at any level too much for JFS? 

Because these are older machines, they are all running Debian 4 with a 
backported 2.6.26 kernel.. Is there any chance upgrading to Debian 5 and a 
newer kernel would help?

Thanks in advance for any help :-)

Tim

On Aug 25, 2010, at 2:16 PM, Tim Nufire wrote:

> Hello,
> 
> I've got a problem that I'm hoping someone on this list can help me with...
> 
> Read-only fsck.jfs checks on my oldest volumes are reporting an alarming 
> number of corrupted root nodes despite the fact that these volumes appear to 
> be healthy when mounted read-only. Here's the error that I'm getting...
> 
> fsck.jfs -n -v /dev/md/10
> fsck.jfs version 1.1.14, 06-Apr-2009
> processing started: 8/13/2010 10.9.6
> The current device is:  /dev/md/10
> Open(...READONLY...) returned rc = 0
> Primary superblock is valid.
> The type of file system for the device is JFS.
> Block size in bytes:  4096
> Filesystem size in blocks:  4756914448
> **Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
> Invalid data format detected in root directory.
> CANNOT CONTINUE.
> ERRORS HAVE BEEN DETECTED.  Run fsck with the -f parameter to repair.
> processing terminated:  8/13/2010 10:10:05  with return code: 10062  exit 
> code: 4.
> 
> Despite the catastrophic sounding error above, mounting the file system 
> read-only and listing the directory from the command-line works fine....
> 
> ls 
> 20090110  20090303  20090418  20090605  20090721  20090914  20091030  
> 20091215  20100130  20100317  20100502  20100617
> 20090111  20090304  20090419  20090606  20090722  20090915  20091031  
> 20091216  20100131  20100318  20100503  20100618
> 20090113  20090305  20090420  20090607  20090723  20090916  20091101  
> 20091217  20100201  20100319  20100504  20100619
> 20090114  20090306  20090421  20090608  20090724  20090917  20091102  
> 20091218  20100202  20100320  20100505  20100620
> 20090115  20090307  20090422  20090609  20090725  20090918  20091103  
> 20091219  20100203  20100321  20100506  20100622
> 20090116  20090308  20090423  20090610  20090727  20090919  20091104  
> 20091220  20100204  20100322  20100507  20100623
> 20090117  20090309  20090424  20090611  20090728  20090920  20091105  
> 20091221  20100205  20100323  20100508  20100624
> 20090118  20090310  20090425  20090612  20090729  20090921  20091106  
> 20091222  20100206  20100324  20100509  20100625
> 20090119  20090311  20090426  20090613  20090730  20090922  20091107  
> 20091223  20100207  20100325  20100510  20100626
> 20090120  20090312  20090427  20090614  20090731  20090923  20091108  
> 20091224  20100208  20100326  20100511  20100627
> 20090121  20090313  20090428  20090615  20090801  20090924  20091109  
> 20091225  20100209  20100327  20100512  20100628
> 20090122  20090314  20090429  20090616  20090802  20090925  20091110  
> 20091226  20100210  20100328  20100513  20100629
> 20090123  20090315  20090430  20090617  20090803  20090926  20091111  
> 20091227  20100211  20100329  20100514  20100630
> 20090126  20090316  20090501  20090618  20090804  20090927  20091112  
> 20091228  20100212  20100330  20100515  20100701
> 20090127  20090317  20090502  20090619  20090805  20090928  20091113  
> 20091229  20100213  20100331  20100516  20100702
> 20090128  20090318  20090503  20090620  20090809  20090929  20091114  
> 20091230  20100214  20100401  20100517  20100703
> 20090129  20090319  20090504  20090621  20090810  20090930  20091115  
> 20091231  20100215  20100402  20100518  20100704
> 20090130  20090320  20090505  20090622  20090811  20091001  20091116  
> 20100101  20100216  20100403  20100519  20100705
> 20090202  20090321  20090506  20090623  20090812  20091002  20091117  
> 20100102  20100217  20100404  20100520  20100706
> 20090204  20090322  20090507  20090624  20090813  20091003  20091118  
> 20100103  20100218  20100405  20100521  20100707
> 20090205  20090323  20090508  20090625  20090814  20091004  20091119  
> 20100104  20100219  20100406  20100522  20100708
> 20090206  20090324  20090509  20090626  20090815  20091005  20091120  
> 20100105  20100220  20100407  20100523  20100709
> 20090207  20090325  20090510  20090627  20090816  20091006  20091121  
> 20100106  20100221  20100408  20100524  20100710
> 20090208  20090326  20090511  20090628  20090817  20091007  20091122  
> 20100107  20100222  20100409  20100525  20100711
> 20090209  20090327  20090512  20090629  20090818  20091008  20091123  
> 20100108  20100223  20100410  20100526  20100712
> 20090210  20090328  20090513  20090630  20090819  20091009  20091124  
> 20100109  20100224  20100411  20100527  20100713
> 20090211  20090329  20090514  20090701  20090820  20091010  20091125  
> 20100110  20100225  20100412  20100528  20100714
> 20090212  20090330  20090515  20090702  20090821  20091011  20091126  
> 20100111  20100226  20100413  20100529  20100715
> 20090213  20090331  20090516  20090703  20090822  20091012  20091127  
> 20100112  20100227  20100414  20100530  20100716
> 20090214  20090401  20090517  20090704  20090823  20091013  20091128  
> 20100113  20100228  20100415  20100531  20100717
> 20090215  20090402  20090518  20090705  20090824  20091014  20091129  
> 20100114  20100301  20100416  20100601  20100718
> 20090216  20090403  20090519  20090706  20090825  20091015  20091130  
> 20100115  20100302  20100417  20100602  20100719
> 20090217  20090404  20090520  20090707  20090826  20091016  20091201  
> 20100116  20100303  20100418  20100603  20100720
> 20090218  20090405  20090521  20090708  20090827  20091017  20091202  
> 20100117  20100304  20100419  20100604  20100721
> 20090219  20090406  20090522  20090709  20090828  20091018  20091203  
> 20100118  20100305  20100420  20100605  20100722
> 20090220  20090407  20090523  20090710  20090901  20091019  20091204  
> 20100119  20100306  20100421  20100606  20100723
> 20090221  20090408  20090524  20090711  20090902  20091020  20091205  
> 20100120  20100307  20100422  20100607  20100724
> 20090222  20090409  20090527  20090712  20090903  20091021  20091206  
> 20100121  20100308  20100423  20100608  20100725
> 20090223  20090410  20090528  20090713  20090904  20091022  20091207  
> 20100122  20100309  20100424  20100609  20100726
> 20090224  20090411  20090529  20090714  20090905  20091023  20091208  
> 20100123  20100310  20100425  20100610  20100727
> 20090225  20090412  20090530  20090715  20090906  20091024  20091209  
> 20100124  20100311  20100426  20100611  20100728
> 20090226  20090413  20090531  20090716  20090907  20091025  20091210  
> 20100125  20100312  20100427  20100612  20100729
> 20090227  20090414  20090601  20090717  20090908  20091026  20091211  
> 20100126  20100313  20100428  20100613  mount_check
> 20090228  20090415  20090602  20090718  20090909  20091027  20091212  
> 20100127  20100314  20100429  20100614
> 20090301  20090416  20090603  20090719  20090912  20091028  20091213  
> 20100128  20100315  20100430  20100615
> 20090302  20090417  20090604  20090720  20090913  20091029  20091214  
> 20100129  20100316  20100501  20100616
> 
> Running fsck.jfs read-wrirte re-initiallizes the root node and moves all of 
> its former contents into lost+found. I can recover the data from lost+found 
> so this is not fatal but still something I would like to fix/avoid.
> 
> I have not repaired the above volume yet but have repaired others... Here's 
> the fsck.jfs output for a read-write repair on a volume that had the same 
> errors as those described above.
> 
> fsck.jfs -v /dev/md10
> fsck.jfs version 1.1.14, 06-Apr-2009
> processing started: 4/23/2010 4.32.24
> Using default parameter: -p
> The current device is:  /dev/md10
> Open(...READ/WRITE EXCLUSIVE...) returned rc = 0
> Primary superblock is valid.
> The type of file system for the device is JFS.
> Block size in bytes:  4096
> Filesystem size in blocks:  4756914448
> **Phase 0 - Replay Journal Log
> LOGREDO:  Log record for Sync Point at:    0x05774f34
> LOGREDO:  Beginning to update the Inode Allocation Map.
> LOGREDO:  Done updating the Inode Allocation Map.
> LOGREDO:  Beginning to update the Block Map.
> LOGREDO:  Incorrect leaf index detected (k=(d) 0, j=(d) 0, idx=(d) 0) while 
> writing Block Map.
> LOGREDO:  Write Block Map control page failed in UpdateMaps().
> LOGREDO:  Unable to update map(s).
> logredo failed (rc=-231).  fsck continuing.
> **Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
> Root directory has a corrupt tree.
> Initialized tree created for root directory.
> The root directory has an invalid data format.  Will correct.
> **Phase 2 - Count links
> **Phase 3 - Duplicate Block Rescan and Directory Connectedness
> **Phase 4 - Report Problems
> **Phase 5 - Check Connectivity
> **Phase 6 - Perform Approved Corrections
> Superblock marked dirty because repairs are about to be written.
> No \lost+found directory found in the filesystem.
> Directory inode  18661404 has been reconnected to /lost+found/.
> Directory inode  18637982 has been reconnected to /lost+found/.
> Directory inode  18614880 has been reconnected to /lost+found/.
> Directory inode  18595359 has been reconnected to /lost+found/.
> Directory inode  18581312 has been reconnected to /lost+found/.
> Directory inode  18556038 has been reconnected to /lost+found/.
> .
> .
> .
> Directory inode  448971 has been reconnected to /lost+found/.
> File inode  443531 has been reconnected to /lost+found/.
> Directory inode  442414 has been reconnected to /lost+found/.
> .
> .
> .
> Directory inode  2320 has been reconnected to /lost+found/.
> Directory inode  101 has been reconnected to /lost+found/.
> Directory inode  32 has been reconnected to /lost+found/.
> 622 directories reconnected to /lost+found/.
> 1 file reconnected to /lost+found/.
> **Phase 7 - Rebuild File/Directory Allocation Maps
> **Phase 8 - Rebuild Disk Allocation Maps
> **Phase 9 - Reformat File System Log
> logformat returned rc = 0
> Filesystem Summary:
> Blocks in use for inodes:  2276956
> Inode count:  18215648
> File count:  16453081
> Directory count:  1529882
> Block count:  4756914448
> Free block count:  655162544
> 19027657792 kilobytes total disk space.
>   6342069 kilobytes in 1529882 directories.
> 16397493672 kilobytes in 16453081 user files.
>         0 kilobytes in extended attributes
>         0 kilobytes in access control lists
>  15856013 kilobytes reserved for system use.
> 2620650176 kilobytes are available for use.
> Filesystem is clean.
> All observed inconsistencies have been repaired.
> Filesystem has been marked clean.
> **** Filesystem was modified. ****
> processing terminated:  4/23/2010 9:08:55  with return code: 0  exit code: 1.
> 
> This problem appears to be related to age and/or the number of directories in 
> the root node. It's hard to distinguish between these two attributes in our 
> environment because the root node of our data volumes contain one directory 
> for each day the volume has been in use. The tipping point appears to be 
> around 500 days/directories.
> 
> Is this a known issue? Is there really a problem  with the root node or does 
> fsck.jfs have an analysis bug? In any event, since the OS can list the 
> contents of the root node, fsck.jfs should be able to do better than just 
> dumping all the contents into lost+found.
> 
> I've also seen corruption in my allocation maps which could be related... How 
> can I help debug this further?
> 
> Thanks!
> 
> Tim
> 
> ------------------------------------------------------------------------------
> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
> Be part of this innovative community and reach millions of netbook users 
> worldwide. Take advantage of special opportunities to increase revenue and 
> speed time-to-market. Join now, and jumpstart your future.
> http://p.sf.net/sfu/intel-atom-d2d_______________________________________________
> Jfs-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/jfs-discussion

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to