I'd completely forgotten about an issue we had a ways back that might be 
related. 

About two years ago we internally identified a JFS oddity we coined "STP 
directories" (because they were dead and bloated).  We use a number of 
temporary directories for a cluster where 5-250 100b-50k files per second are 
created, and shortly after read and destroyed.  The number of files in the 
directory at any given time is 100-5000.

After a while, not only did ls and find take a long time, but eventually the 
directory just stopped.  Any attempt to stat, touch, find, read, write, delete, 
etc would just hang.

An ls in the folder directly above showed the directory size as somewhere 
around 70mb.  Not the contents, folder itself.  We didn't have the time to dig 
through the source, so we just moved the directory out of the way and made a 
new one.  A copy eventually got all the files out of the "big" folders.  

We theorized at the time (again, without looking at the source, so this is 
coming straight out of my @ss), that the as files were being deleted the 
directory entries were not getting deleted/reused completely so blocks kept 
getting allocated, and never got deallocated.  fsck gave nothing back.  

The directory swap is now scripted to happen once a week.  We actually re-wrote 
a bit of our app to use either directory so we could move the directory and 
create a new one and not delete the old one until all of the files had been 
processed.  The really high volume directory (100-250 files per second) were 
broken up into per-minute subdirectories.  This solution was acceptable to us 
and I forgot to report it.

We had moved all our folders like this off ext2 around the time JFS made it 
into mainline.  We've experimented with migrating off JFS to ext3 or XFS, but 
other issues always crop up.  In fact, head to head our 2 year old JFS 
filesystems still beat _new_ XFS, ext3 and reiser4 in our application.

Other than this the bloat, which I admit is a result of an extreme use, JFS 
just works.  It uses less CPU, is stable, has never been damaged past the point 
of fsck repair, and easily resizes.  I don't think we've ever lost a file.  We 
just have some extras we can't delete in these old STP directories :)

The only this other than this I'd like to see changed is the default journal 
size calculation.  0.4% of 1TB+ is a bit big, and makes normal log 
recoveries/checks feel like the bad old ext2 days.  A static journal size over 
a certain filesystem size would seem to make sense, and balance what I assume 
was the original intent of the percentage.

Anything I can do to help with this, please let me know.

dave


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Jfs-discussion mailing list
Jfs-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to