I tested the patch tonight with mixed results...

The good news is that it fixed the disk allocation maps bug in jfs_mkfs and 
fsck.jfs now runs cleanly on a newly formatted volume.

Unfortunately, I'm still seeing the  logredo failure... To test this I 
simulated a RAID failure while my server was under heavy load and then 
re-assembled the array. As expected, the JFS file system on this array was not 
closed cleanly... This should be an equivalent state to that of a power failure 
but was easier to do remotely :-)

Here's the output of fsck showing the rc=-231 error during the logredo....

fsck.jfs -v /dev/md15
fsck.jfs version 1.1.14, 06-Apr-2009
processing started: 6/6/2010 3.6.44
Using default parameter: -p
The current device is:  /dev/md15
Open(...READ/WRITE EXCLUSIVE...) returned rc = 0
Primary superblock is valid.
The type of file system for the device is JFS.
Block size in bytes:  4096
Filesystem size in blocks:  4756914448
**Phase 0 - Replay Journal Log
LOGREDO:  Log record for Sync Point at:    0x088276c
LOGREDO:  Beginning to update the Inode Allocation Map.
LOGREDO:  Done updating the Inode Allocation Map.
LOGREDO:  Beginning to update the Block Map.
LOGREDO:  Incorrect leaf index detected (k=(d) 0, j=(d) 0, idx=(d) 0) while 
writing Block Map.
LOGREDO:  Write Block Map control page failed in UpdateMaps().
LOGREDO:  Unable to update map(s).
logredo failed (rc=-231).  fsck continuing.
**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
.
.
.

Here's the volume information:

jfs_tune -l /dev/md15
jfs_tune version 1.1.14, 06-Apr-2009

JFS filesystem superblock:

JFS magic number:       'JFS1'
JFS version:            1
JFS state:              mounted
JFS flags:              JFS_LINUX  JFS_COMMIT  JFS_GROUPCOMMIT  JFS_INLINELOG  
Aggregate block size:   4096 bytes
Aggregate size:         38053891680 blocks
Physical block size:    512 bytes
Allocation group size:  67108864 aggregate blocks
Log device number:      0x90b
Filesystem creation:    Sun Jun  6 01:35:43 2010
Volume label:           'id-0850422'

During this test I also hit kernel level JFS bug which I assume is unrelated to 
the jfsutils patch but I'm including it anyway just incase the changes to 
jfs_mkfs somehow caused this...

Jun  6 01:50:10 ul085 kernel: [124387.345574] ERROR: (device md11): txAbort
Jun  6 01:50:10 ul085 kernel: [124387.394483] BUG at fs/jfs/jfs_txnmgr.c:939 
assert(mp->nohomeok > 0)
Jun  6 01:50:10 ul085 kernel: [124387.394533] ------------[ cut here 
]------------
Jun  6 01:50:10 ul085 kernel: [124387.394555] kernel BUG at 
fs/jfs/jfs_txnmgr.c:939!
Jun  6 01:50:10 ul085 kernel: [124387.394576] invalid opcode: 0000 [1] SMP 
Jun  6 01:50:10 ul085 kernel: [124387.394597] CPU 0 
Jun  6 01:50:10 ul085 kernel: [124387.396184] Modules linked in: xt_multiport 
ipt_LOG xt_tcpudp xt_state iptable_filter ip_tables nf_conntrack_pptp 
nf_conntrack_proto_dccp nf_conntrack_ftp nf_conntrack_proto_udplite 
nf_conntrack_netlink nfnetlink nf_nat nf_conntrack_tftp nf_conntrack_irc 
nf_conntrack_sip xt_conntrack x_tables nf_conntrack_h323 
nf_conntrack_proto_sctp ts_kmp nf_conntrack_amanda nf_conntrack_netbios_ns 
nf_conntrack_proto_gre nf_conntrack_sane nf_conntrack_ipv6 nf_conntrack_ipv4 
nf_conntrack ipv6 jfs nls_base fuse coretemp w83627ehf hwmon_vid sbp2 loop 
serio_raw snd_hda_intel pcspkr i2c_i801 psmouse snd_pcm i2c_core snd_timer snd 
soundcore snd_page_alloc intel_agp button evdev ext3 jbd mbcache raid10 raid456 
async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod 
ide_disk ata_generic sd_mod jmicron ata_piix ohci1394 ieee1394 ide_pci_generic 
ide_core sata_sil24 ehci_hcd libata scsi_mod dock e1000e uhci_hcd thermal 
processor fan thermal_sys [last unloaded: scsi_wait_scan]
Jun  6 01:50:10 ul085 kernel: [124387.396500] Pid: 3845, comm: jfsCommit 
Tainted: G        W 2.6.26-2-amd64 #1
Jun  6 01:50:10 ul085 kernel: [124387.396500] RIP: 0010:[<ffffffffa02c498f>]  
[<ffffffffa02c498f>] :jfs:txUnlock+0xb2/0x211
Jun  6 01:50:10 ul085 kernel: [124387.396500] RSP: 0018:ffff8100c4765e90  
EFLAGS: 00010286
Jun  6 01:50:10 ul085 kernel: [124387.396500] RAX: 000000000000004b RBX: 
ffff810017c8dce0 RCX: 000000000117419b
Jun  6 01:50:10 ul085 kernel: [124387.396500] RDX: ffff810080a46000 RSI: 
0000000000000046 RDI: 0000000000000286
Jun  6 01:50:10 ul085 kernel: [124387.396500] RBP: ffffc20009901258 R08: 
ffff8100c719ae00 R09: ffff8100c4765a00
Jun  6 01:50:10 ul085 kernel: [124387.396500] R10: 0000000000000000 R11: 
0000010000dd80d0 R12: ffffc200097c23c0
Jun  6 01:50:10 ul085 kernel: [124387.396500] R13: ffff8100c719ae00 R14: 
0000000000b300b3 R15: 0000000000000004
Jun  6 01:50:10 ul085 kernel: [124387.396500] FS:  0000000000000000(0000) 
GS:ffffffff8053d000(0000) knlGS:0000000000000000
Jun  6 01:50:10 ul085 kernel: [124387.396500] CS:  0010 DS: 0018 ES: 0018 CR0: 
000000008005003b
Jun  6 01:50:10 ul085 kernel: [124387.396500] CR2: 00007fb67ce53000 CR3: 
00000000c3ccb000 CR4: 00000000000006e0
Jun  6 01:50:10 ul085 kernel: [124387.396500] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Jun  6 01:50:10 ul085 kernel: [124387.396500] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
Jun  6 01:50:10 ul085 kernel: [124387.396500] Process jfsCommit (pid: 3845, 
threadinfo ffff8100c4764000, task ffff8100ca30ab50)
Jun  6 01:50:10 ul085 kernel: [124387.396500] Stack:  0000ffff805a64c0 
ffff8100c719ae00 ffffc200097c23c0 0000000000000286
Jun  6 01:50:10 ul085 kernel: [124387.396500]  ffff8100cadd1580 
ffffffff805a64c0 0000000000000004 ffffffffa02c73db
Jun  6 01:50:10 ul085 kernel: [124387.396500]  0000000000000000 
ffff8100ca30ab50 ffffffff8022c108 0000000000100100
Jun  6 01:50:10 ul085 kernel: [124387.396500] Call Trace:
Jun  6 01:50:10 ul085 kernel: [124387.396500]  [<ffffffffa02c73db>] ? 
:jfs:jfs_lazycommit+0xfb/0x22e
Jun  6 01:50:10 ul085 kernel: [124387.396500]  [<ffffffff8022c108>] ? 
default_wake_function+0x0/0xe
Jun  6 01:50:10 ul085 kernel: [124387.396500]  [<ffffffffa02c72e0>] ? 
:jfs:jfs_lazycommit+0x0/0x22e
Jun  6 01:50:10 ul085 kernel: [124387.396500]  [<ffffffff80245feb>] ? 
kthread+0x47/0x74
Jun  6 01:50:10 ul085 kernel: [124387.396500]  [<ffffffff8023008d>] ? 
schedule_tail+0x27/0x5c
Jun  6 01:50:10 ul085 kernel: [124387.396500]  [<ffffffff8020cf38>] ? 
child_rip+0xa/0x12
Jun  6 01:50:10 ul085 kernel: [124387.396500]  [<ffffffff80245fa4>] ? 
kthread+0x0/0x74
Jun  6 01:50:10 ul085 kernel: [124387.396500]  [<ffffffff8020cf2e>] ? 
child_rip+0x0/0x12
Jun  6 01:50:10 ul085 kernel: [124387.396500] 
Jun  6 01:50:10 ul085 kernel: [124387.396500] 
Jun  6 01:50:10 ul085 kernel: [124387.396500] Code: d2 ff ff 8b 43 68 85 c0 7f 
25 48 c7 c1 03 c0 2c a0 ba ab 03 00 00 48 c7 c6 d3 bf 2c a0 48 c7 c7 e7 bf 2c 
a0 31 c0 e8 8e 09 f7 df <0f> 0b eb fe ff c8 85 c0 89 43 68 75 09 48 8b 7b 58 e8 
00 4a fb
Jun  6 01:50:10 ul085 kernel: [124387.396500] RIP  [<ffffffffa02c498f>] 
:jfs:txUnlock+0xb2/0x211
Jun  6 01:50:10 ul085 kernel: [124387.396500]  RSP <ffff8100c4765e90>
Jun  6 01:50:10 ul085 kernel: [124387.396500] ---[ end trace 4eaa2a86a8e2da22 
]---

Let me know if there is anything else I can provide to help debug this issue.

Thanks!

Tim

On Jun 4, 2010, at 12:06 AM, Sandon Van Ness wrote:

> Dave Kleikamp wrote:
>> 
>> On Tue, 2010-05-25 at 15:15 -0700, Tim Nufire wrote:
>>   
>>> Dave,
>>> 
>>> Any update on this issue? Every time I run fsck on a volume greater
>>> than 12TB I have this problem. 
>>>     
>> 
>> I haven't seen this exact failure, but the patch I sent you on the other
>> problem may be enough to fix this.  Could you let me know if you're
>> still having any problems with the latest code?
>> 
>> Thanks,
>> Shaggy
>>   
> 
> Tim may not have a system he can easily test this 'on demand' but as soon as 
> my newly setup raid array finishes initializing and i copy all my data to it 
> (about 2 days) I can go ahead and test this to see if the logredo fails like 
> it used to on my system.
> 
> This was the only other problem I had with JFS which wasn't that big of a 
> deal for me as I was forced to do an fsck only 5 or 6 times and fsck's only 
> took around 10-11 minutes on my system. I will reply to this thread to let 
> you know the results (if Tim doesn't before me).

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to