I tested the patch tonight with mixed results...
The good news is that it fixed the disk allocation maps bug in jfs_mkfs and
fsck.jfs now runs cleanly on a newly formatted volume.
Unfortunately, I'm still seeing the logredo failure... To test this I
simulated a RAID failure while my server was under heavy load and then
re-assembled the array. As expected, the JFS file system on this array was not
closed cleanly... This should be an equivalent state to that of a power failure
but was easier to do remotely :-)
Here's the output of fsck showing the rc=-231 error during the logredo....
fsck.jfs -v /dev/md15
fsck.jfs version 1.1.14, 06-Apr-2009
processing started: 6/6/2010 3.6.44
Using default parameter: -p
The current device is: /dev/md15
Open(...READ/WRITE EXCLUSIVE...) returned rc = 0
Primary superblock is valid.
The type of file system for the device is JFS.
Block size in bytes: 4096
Filesystem size in blocks: 4756914448
**Phase 0 - Replay Journal Log
LOGREDO: Log record for Sync Point at: 0x088276c
LOGREDO: Beginning to update the Inode Allocation Map.
LOGREDO: Done updating the Inode Allocation Map.
LOGREDO: Beginning to update the Block Map.
LOGREDO: Incorrect leaf index detected (k=(d) 0, j=(d) 0, idx=(d) 0) while
writing Block Map.
LOGREDO: Write Block Map control page failed in UpdateMaps().
LOGREDO: Unable to update map(s).
logredo failed (rc=-231). fsck continuing.
**Phase 1 - Check Blocks, Files/Directories, and Directory Entries
.
.
.
Here's the volume information:
jfs_tune -l /dev/md15
jfs_tune version 1.1.14, 06-Apr-2009
JFS filesystem superblock:
JFS magic number: 'JFS1'
JFS version: 1
JFS state: mounted
JFS flags: JFS_LINUX JFS_COMMIT JFS_GROUPCOMMIT JFS_INLINELOG
Aggregate block size: 4096 bytes
Aggregate size: 38053891680 blocks
Physical block size: 512 bytes
Allocation group size: 67108864 aggregate blocks
Log device number: 0x90b
Filesystem creation: Sun Jun 6 01:35:43 2010
Volume label: 'id-0850422'
During this test I also hit kernel level JFS bug which I assume is unrelated to
the jfsutils patch but I'm including it anyway just incase the changes to
jfs_mkfs somehow caused this...
Jun 6 01:50:10 ul085 kernel: [124387.345574] ERROR: (device md11): txAbort
Jun 6 01:50:10 ul085 kernel: [124387.394483] BUG at fs/jfs/jfs_txnmgr.c:939
assert(mp->nohomeok > 0)
Jun 6 01:50:10 ul085 kernel: [124387.394533] ------------[ cut here
]------------
Jun 6 01:50:10 ul085 kernel: [124387.394555] kernel BUG at
fs/jfs/jfs_txnmgr.c:939!
Jun 6 01:50:10 ul085 kernel: [124387.394576] invalid opcode: 0000 [1] SMP
Jun 6 01:50:10 ul085 kernel: [124387.394597] CPU 0
Jun 6 01:50:10 ul085 kernel: [124387.396184] Modules linked in: xt_multiport
ipt_LOG xt_tcpudp xt_state iptable_filter ip_tables nf_conntrack_pptp
nf_conntrack_proto_dccp nf_conntrack_ftp nf_conntrack_proto_udplite
nf_conntrack_netlink nfnetlink nf_nat nf_conntrack_tftp nf_conntrack_irc
nf_conntrack_sip xt_conntrack x_tables nf_conntrack_h323
nf_conntrack_proto_sctp ts_kmp nf_conntrack_amanda nf_conntrack_netbios_ns
nf_conntrack_proto_gre nf_conntrack_sane nf_conntrack_ipv6 nf_conntrack_ipv4
nf_conntrack ipv6 jfs nls_base fuse coretemp w83627ehf hwmon_vid sbp2 loop
serio_raw snd_hda_intel pcspkr i2c_i801 psmouse snd_pcm i2c_core snd_timer snd
soundcore snd_page_alloc intel_agp button evdev ext3 jbd mbcache raid10 raid456
async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod
ide_disk ata_generic sd_mod jmicron ata_piix ohci1394 ieee1394 ide_pci_generic
ide_core sata_sil24 ehci_hcd libata scsi_mod dock e1000e uhci_hcd thermal
processor fan thermal_sys [last unloaded: scsi_wait_scan]
Jun 6 01:50:10 ul085 kernel: [124387.396500] Pid: 3845, comm: jfsCommit
Tainted: G W 2.6.26-2-amd64 #1
Jun 6 01:50:10 ul085 kernel: [124387.396500] RIP: 0010:[<ffffffffa02c498f>]
[<ffffffffa02c498f>] :jfs:txUnlock+0xb2/0x211
Jun 6 01:50:10 ul085 kernel: [124387.396500] RSP: 0018:ffff8100c4765e90
EFLAGS: 00010286
Jun 6 01:50:10 ul085 kernel: [124387.396500] RAX: 000000000000004b RBX:
ffff810017c8dce0 RCX: 000000000117419b
Jun 6 01:50:10 ul085 kernel: [124387.396500] RDX: ffff810080a46000 RSI:
0000000000000046 RDI: 0000000000000286
Jun 6 01:50:10 ul085 kernel: [124387.396500] RBP: ffffc20009901258 R08:
ffff8100c719ae00 R09: ffff8100c4765a00
Jun 6 01:50:10 ul085 kernel: [124387.396500] R10: 0000000000000000 R11:
0000010000dd80d0 R12: ffffc200097c23c0
Jun 6 01:50:10 ul085 kernel: [124387.396500] R13: ffff8100c719ae00 R14:
0000000000b300b3 R15: 0000000000000004
Jun 6 01:50:10 ul085 kernel: [124387.396500] FS: 0000000000000000(0000)
GS:ffffffff8053d000(0000) knlGS:0000000000000000
Jun 6 01:50:10 ul085 kernel: [124387.396500] CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
Jun 6 01:50:10 ul085 kernel: [124387.396500] CR2: 00007fb67ce53000 CR3:
00000000c3ccb000 CR4: 00000000000006e0
Jun 6 01:50:10 ul085 kernel: [124387.396500] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jun 6 01:50:10 ul085 kernel: [124387.396500] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jun 6 01:50:10 ul085 kernel: [124387.396500] Process jfsCommit (pid: 3845,
threadinfo ffff8100c4764000, task ffff8100ca30ab50)
Jun 6 01:50:10 ul085 kernel: [124387.396500] Stack: 0000ffff805a64c0
ffff8100c719ae00 ffffc200097c23c0 0000000000000286
Jun 6 01:50:10 ul085 kernel: [124387.396500] ffff8100cadd1580
ffffffff805a64c0 0000000000000004 ffffffffa02c73db
Jun 6 01:50:10 ul085 kernel: [124387.396500] 0000000000000000
ffff8100ca30ab50 ffffffff8022c108 0000000000100100
Jun 6 01:50:10 ul085 kernel: [124387.396500] Call Trace:
Jun 6 01:50:10 ul085 kernel: [124387.396500] [<ffffffffa02c73db>] ?
:jfs:jfs_lazycommit+0xfb/0x22e
Jun 6 01:50:10 ul085 kernel: [124387.396500] [<ffffffff8022c108>] ?
default_wake_function+0x0/0xe
Jun 6 01:50:10 ul085 kernel: [124387.396500] [<ffffffffa02c72e0>] ?
:jfs:jfs_lazycommit+0x0/0x22e
Jun 6 01:50:10 ul085 kernel: [124387.396500] [<ffffffff80245feb>] ?
kthread+0x47/0x74
Jun 6 01:50:10 ul085 kernel: [124387.396500] [<ffffffff8023008d>] ?
schedule_tail+0x27/0x5c
Jun 6 01:50:10 ul085 kernel: [124387.396500] [<ffffffff8020cf38>] ?
child_rip+0xa/0x12
Jun 6 01:50:10 ul085 kernel: [124387.396500] [<ffffffff80245fa4>] ?
kthread+0x0/0x74
Jun 6 01:50:10 ul085 kernel: [124387.396500] [<ffffffff8020cf2e>] ?
child_rip+0x0/0x12
Jun 6 01:50:10 ul085 kernel: [124387.396500]
Jun 6 01:50:10 ul085 kernel: [124387.396500]
Jun 6 01:50:10 ul085 kernel: [124387.396500] Code: d2 ff ff 8b 43 68 85 c0 7f
25 48 c7 c1 03 c0 2c a0 ba ab 03 00 00 48 c7 c6 d3 bf 2c a0 48 c7 c7 e7 bf 2c
a0 31 c0 e8 8e 09 f7 df <0f> 0b eb fe ff c8 85 c0 89 43 68 75 09 48 8b 7b 58 e8
00 4a fb
Jun 6 01:50:10 ul085 kernel: [124387.396500] RIP [<ffffffffa02c498f>]
:jfs:txUnlock+0xb2/0x211
Jun 6 01:50:10 ul085 kernel: [124387.396500] RSP <ffff8100c4765e90>
Jun 6 01:50:10 ul085 kernel: [124387.396500] ---[ end trace 4eaa2a86a8e2da22
]---
Let me know if there is anything else I can provide to help debug this issue.
Thanks!
Tim
On Jun 4, 2010, at 12:06 AM, Sandon Van Ness wrote:
> Dave Kleikamp wrote:
>>
>> On Tue, 2010-05-25 at 15:15 -0700, Tim Nufire wrote:
>>
>>> Dave,
>>>
>>> Any update on this issue? Every time I run fsck on a volume greater
>>> than 12TB I have this problem.
>>>
>>
>> I haven't seen this exact failure, but the patch I sent you on the other
>> problem may be enough to fix this. Could you let me know if you're
>> still having any problems with the latest code?
>>
>> Thanks,
>> Shaggy
>>
>
> Tim may not have a system he can easily test this 'on demand' but as soon as
> my newly setup raid array finishes initializing and i copy all my data to it
> (about 2 days) I can go ahead and test this to see if the logredo fails like
> it used to on my system.
>
> This was the only other problem I had with JFS which wasn't that big of a
> deal for me as I was forced to do an fsck only 5 or 6 times and fsck's only
> took around 10-11 minutes on my system. I will reply to this thread to let
> you know the results (if Tim doesn't before me).
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion