For what it's worth, we have not seen a single journal replay error since the 
last patch and are using the default journal size…

Tim

On Sep 4, 2011, at 7:46 AM, Sandon Van Ness wrote:

>  Well I went through several more fsck's due to 2.6.39 (I upgraded for newer 
> scsi driver support for a new raid controller) and it had some bug with USB 
> plug events which was causing panics. 
> 
> Anyway now I am wondering if its possible that the reason my journal isn't 
> replaying is because my journal size is 1024MB. I remember I made it 1024 MB 
> when I formatted it. I now vaguely remember something about a supposed 
> maximum size of 128 MB for the journal log size? 
> 
> Could a 1024MB journal log cause this? I assumed I should make it bigger 
> because of how big the volume is but maybe that was useless? 
> 
> I want to use the right one now since I just setup a 84 TB usable (90TB raw) 
> raid array and did a test format (everything working well) but before I do 
> the final format I want to make sure I can avoid running into this issue 
> again (if its avoidable). 
> 
> If it is the journal log size then I can copy the 29TB of data I currently 
> have on my 36 TB volume over to the 84 TB volume and re-create the 
> file-system And defrag it in the process). 
> 
> 
> On 07/29/2011 09:13 AM, Dave Kleikamp wrote: 
>> On 07/28/2011 07:10 AM, Sandon Van Ness wrote: 
>>>   On 04/22/2011 05:42 AM, Dave Kleikamp wrote: 
>>>> Doh! You're right. I was thinking it was something it got at compile 
>>>> time. 
>>>> 
>>>> Yeah, I trust you, now that you pointed out the hard-coded date in the 
>>>> header.  :-) 
>>>> 
>>>> I'll have to try to recreate the problem again and see what else needs 
>>>> fixing. 
>>>> 
>>>> Thanks, 
>>>> Shaggy 
>>>> 
>>> Ok so my computer kernel panic'd (damn nvidia GPU drivers) and I had to 
>>> do an fsck again (the first time since I previously replied to this 
>>> thread). 
>>> 
>>> One bit of behavior I noticed is it did sit at the trying to replay 
>>> journal log for quite some time before it finally error'd with the 
>>> logredo failed out but still wasn't able to do it. I seem to remember 
>>> before it would almost instantly say logredo failed: 
>>> 
>>> fsck.jfs version 1.1.15, 04-Mar-2011 
>>> processing started: 7/25/2011 22:53:12 
>>> The current device is:  /dev/sdd1 
>>> Block size in bytes:  4096 
>>> Filesystem size in blocks:  8718748407 
>>> **Phase 0 - Replay Journal Log 
>>> logredo failed (rc=-220).  fsck continuing. 
>> Failed updating the block map. I'll need to look into this. 
>> 
>>> **Phase 1 - Check Blocks, Files/Directories, and  Directory Entries 
>>> **Phase 2 - Count links 
>>> Incorrect link counts have been detected. Will correct. 
>>> **Phase 3 - Duplicate Block Rescan and Directory Connectedness 
>>> **Phase 4 - Report Problems 
>>> File system object DF3649600 is linked as: 
>>> /boxbackup/mail/sandon/Maildir/.Eastvale yahoogroup/cur 
>>> cannot repair the data format error(s) in this directory. 
>>> cannot repair DF3649600.  Will release. 
>>> File system object DF3704486 is linked as: 
>>> /boxbackup/mail/sandon/Maildir/.saturation/cur 
>>> cannot repair the data format error(s) in this directory. 
>>> cannot repair DF3704486.  Will release. 
>>> File system object DF3704736 is linked as: 
>>> /boxbackup/mail/sandon/Maildir/.saturation 
>>> **Phase 5 - Check Connectivity 
>>> **Phase 6 - Perform Approved Corrections 
>>> 103120 files reconnected to /lost+found/. 
>>> **Phase 7 - Rebuild File/Directory Allocation Maps 
>>> **Phase 8 - Rebuild Disk Allocation Maps 
>>> **Phase 9 - Reformat File System Log 
>>> 34874993628 kilobytes total disk space. 
>>>    1890058 kilobytes in 651997 directories. 
>>> 26331821630 kilobytes in 6731444 user files. 
>>>      11924 kilobytes in extended attributes 
>>>    9376504 kilobytes reserved for system use. 
>>> 8535673628 kilobytes are available for use. 
>>> Filesystem is clean. 
>>> 
>>> The three directories that went to lost+ found weren't a big deal since 
>>> they were just backups. They are also huge directories with 10s of 
>>> thousands of files in them. 
>>> 
>>> Also I was kind of curious if the fsck of JFS uses libaio or another 
>>> type of multi-threaded I/O that speeds up the I/O on raid arrays? The 
>>> fsck took about 15 minutes and it seems like the disk activity on my 
>>> array was much more than most single threaded apps that do a lot of 
>>> random reads on the array although it could just be a lot of my metadata 
>>> is arranged sequentially on the array and that is why. 
>> fsck.jfs doesn't do anything special to optimize I/O. 
>> 
>>> Also very soon (less than a month) I will be building a 30x3TB (raid6) 
>>> array  so 84TB (76.4 TiB) so I will get a chance to try jfs with>64TiB. 
>>> Since my current file-system which is over 75% full and over 32TiB is 
>>> working ok I don't suspect any problems. 
>> I'm not sure if the problems above might be large file system related, 
>> or not. It's possible that we might hit some new limit with a larger 
>> filesystem, so I'd be interested if you have any more issues. 
>> 
>>> I do recall Tim mentioning that this did fix his problem but he had 
>>> smaller volumes (24TB) so larger than 16TiB smaller than 32TiB (not sure 
>>> if that matters or not). 
>>> 
> 
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2dcopy1_______________________________________________
> Jfs-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/jfs-discussion

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to