Well I went through several more fsck's due to 2.6.39 (I upgraded for newer scsi driver support for a new raid controller) and it had some bug with USB plug events which was causing panics.

Anyway now I am wondering if its possible that the reason my journal isn't replaying is because my journal size is 1024MB. I remember I made it 1024 MB when I formatted it. I now vaguely remember something about a supposed maximum size of 128 MB for the journal log size?

Could a 1024MB journal log cause this? I assumed I should make it bigger because of how big the volume is but maybe that was useless?

I want to use the right one now since I just setup a 84 TB usable (90TB raw) raid array and did a test format (everything working well) but before I do the final format I want to make sure I can avoid running into this issue again (if its avoidable).

If it is the journal log size then I can copy the 29TB of data I currently have on my 36 TB volume over to the 84 TB volume and re-create the file-system And defrag it in the process).


On 07/29/2011 09:13 AM, Dave Kleikamp wrote:
On 07/28/2011 07:10 AM, Sandon Van Ness wrote:
  On 04/22/2011 05:42 AM, Dave Kleikamp wrote:
Doh! You're right. I was thinking it was something it got at compile
time.

Yeah, I trust you, now that you pointed out the hard-coded date in the
header.  :-)

I'll have to try to recreate the problem again and see what else needs
fixing.

Thanks,
Shaggy

Ok so my computer kernel panic'd (damn nvidia GPU drivers) and I had to
do an fsck again (the first time since I previously replied to this
thread).

One bit of behavior I noticed is it did sit at the trying to replay
journal log for quite some time before it finally error'd with the
logredo failed out but still wasn't able to do it. I seem to remember
before it would almost instantly say logredo failed:

fsck.jfs version 1.1.15, 04-Mar-2011
processing started: 7/25/2011 22:53:12
The current device is:  /dev/sdd1
Block size in bytes:  4096
Filesystem size in blocks:  8718748407
**Phase 0 - Replay Journal Log
logredo failed (rc=-220).  fsck continuing.
Failed updating the block map. I'll need to look into this.

**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
**Phase 2 - Count links
Incorrect link counts have been detected. Will correct.
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
**Phase 4 - Report Problems
File system object DF3649600 is linked as:
/boxbackup/mail/sandon/Maildir/.Eastvale yahoogroup/cur
cannot repair the data format error(s) in this directory.
cannot repair DF3649600.  Will release.
File system object DF3704486 is linked as:
/boxbackup/mail/sandon/Maildir/.saturation/cur
cannot repair the data format error(s) in this directory.
cannot repair DF3704486.  Will release.
File system object DF3704736 is linked as:
/boxbackup/mail/sandon/Maildir/.saturation
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
103120 files reconnected to /lost+found/.
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
**Phase 9 - Reformat File System Log
34874993628 kilobytes total disk space.
   1890058 kilobytes in 651997 directories.
26331821630 kilobytes in 6731444 user files.
     11924 kilobytes in extended attributes
   9376504 kilobytes reserved for system use.
8535673628 kilobytes are available for use.
Filesystem is clean.

The three directories that went to lost+ found weren't a big deal since
they were just backups. They are also huge directories with 10s of
thousands of files in them.

Also I was kind of curious if the fsck of JFS uses libaio or another
type of multi-threaded I/O that speeds up the I/O on raid arrays? The
fsck took about 15 minutes and it seems like the disk activity on my
array was much more than most single threaded apps that do a lot of
random reads on the array although it could just be a lot of my metadata
is arranged sequentially on the array and that is why.
fsck.jfs doesn't do anything special to optimize I/O.

Also very soon (less than a month) I will be building a 30x3TB (raid6)
array  so 84TB (76.4 TiB) so I will get a chance to try jfs with>64TiB.
Since my current file-system which is over 75% full and over 32TiB is
working ok I don't suspect any problems.
I'm not sure if the problems above might be large file system related,
or not. It's possible that we might hit some new limit with a larger
filesystem, so I'd be interested if you have any more issues.

I do recall Tim mentioning that this did fix his problem but he had
smaller volumes (24TB) so larger than 16TiB smaller than 32TiB (not sure
if that matters or not).


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to