Ben Hutchings wrote:
> Does this happen without openafs loaded?

Until I can generate the problem, I can't confirm. From time to
time it happens on 142 nodes HPC where OpenAFS is must. We have
seen similar problem also with Ext4 (trace attached) with same
kernel version, just little older revision.

So far we suspect QEMU/KVM 1.4.1, currently we are migrating all
physical nodes to 1.5.0 from unstable/testing to check if it gets
better.

-- 
Vlastimil Holer                             phone:   +420-549 49 5349
CERIT Scientific Cloud                      e-mail:  ho...@ics.muni.cz
Institute of Computer Science MU            twitter: @vholer
May 11 05:12:30 zapat2 kernel: [    0.000000] Linux version 3.2.0-0.bpo.4-amd64 
(debian-ker...@lists.debian.org) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP 
Debian 3.2.41-2~bpo60+1
May 11 13:09:54 zapat2 kernel: [28657.959006] general protection fault: 0000 
[#1] SMP 
May 11 13:09:54 zapat2 kernel: [28657.960102] CPU 0 
May 11 13:09:55 zapat2 kernel: [28657.960546] Modules linked in: openafs(P) 
nls_utf8 isofs des_generic cbc rpcsec_gss_krb5 nfsd nfs lockd fscache 
auth_rpcgss nfs_acl sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack xt_comment xt_multiport iptable_filter ip_tables x_tables ext2 
ib_ipoib mlx4_ib rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_sa ib_uverbs mlx4_en 
mlx4_core ib_umad ib_mad ib_core loop snd_pcm snd_timer snd soundcore 
snd_page_alloc crc32c_intel ghash_clmulni_intel aesni_intel psmouse i2c_piix4 
cryptd pcspkr aes_x86_64 i2c_core tpm_tis tpm i6300esb button tpm_bios evdev 
aes_generic processor thermal_sys serio_raw virtio_balloon ext4 mbcache jbd2 
crc16 dm_mod microcode virtio_net ata_generic virtio_blk virtio_pci floppy 
virtio_ring virtio uhci_hcd ata_piix ehci_hcd libata usbcore scsi_mod 
usb_common [last unloaded: scsi_wait_scan]
May 11 13:09:55 zapat2 kernel: [28657.961401] 
May 11 13:09:55 zapat2 kernel: [28657.961401] Pid: 18822, comm: flush-254:64 
Tainted: P           O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2~bpo60+1 Bochs 
Bochs
May 11 13:09:55 zapat2 kernel: [28657.961401] RIP: 0010:[<ffffffffa012151a>]  
[<ffffffffa012151a>] write_cache_pages_da+0x1fa/0x313 [ext4]
May 11 13:09:55 zapat2 kernel: [28657.961401] RSP: 0018:ffff881e274719e0  
EFLAGS: 00010203
May 11 13:09:55 zapat2 kernel: [28657.961401] RAX: ffff8800bdf7f130 RBX: 
ffff881e27471b50 RCX: 0000000000000000
May 11 13:09:55 zapat2 kernel: [28657.961401] RDX: 00000041000b0000 RSI: 
0000000000000003 RDI: ffffea005186f498
May 11 13:09:55 zapat2 kernel: [28657.961401] RBP: ffff881e27471c90 R08: 
ffff881e27471930 R09: 0000000000000000
May 11 13:09:55 zapat2 kernel: [28657.961401] R10: 0007ffffffffffff R11: 
000000000040eeaf R12: ffff880e6ab5c230
May 11 13:09:55 zapat2 kernel: [28657.961401] R13: 0007ffffffffffff R14: 
fff2bd34000241a3 R15: 000000000040eea2
May 11 13:09:55 zapat2 kernel: [28657.961401] FS:  0000000000000000(0000) 
GS:ffff880f42c00000(0000) knlGS:0000000000000000
May 11 13:09:55 zapat2 kernel: [28657.961401] CS:  0010 DS: 0000 ES: 0000 CR0: 
000000008005003b
May 11 13:09:55 zapat2 kernel: [28657.961401] CR2: 00000000020f7068 CR3: 
0000000f061c9000 CR4: 00000000000406f0
May 11 13:09:55 zapat2 kernel: [28657.961401] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
May 11 13:09:55 zapat2 kernel: [28657.961401] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
May 11 13:09:55 zapat2 kernel: [28657.961401] Process flush-254:64 (pid: 18822, 
threadinfo ffff881e27470000, task ffff881e27ee9120)
May 11 13:09:55 zapat2 kernel: [28657.961401] Stack:
May 11 13:09:55 zapat2 kernel: [28657.961401]  0000000000000001 
0000000000000400 ffff881e27471a30 0000000000000050
May 11 13:09:55 zapat2 kernel: [28657.961401]  0000000e00000000 
ffff880e6ab5c0f0 ffff881e27471bd8 ffff881e27471a30
May 11 13:09:55 zapat2 kernel: [28657.961401]  000000000000000e 
0000000000000000 ffffea005186f498 ffffea005186f4d0
May 11 13:09:55 zapat2 kernel: [28657.961401] Call Trace:
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffffa0121a7a>] ? 
ext4_da_writepages+0x29f/0x45d [ext4]
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffffa003b058>] ? 
virtqueue_kick+0x9/0x19 [virtio_ring]
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff811254ec>] ? 
writeback_single_inode+0x178/0x35e
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff8112599e>] ? 
writeback_sb_inodes+0x169/0x1ff
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff81125aa1>] ? 
__writeback_inodes_wb+0x6d/0xab
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff81125cc7>] ? 
wb_writeback+0x128/0x222
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff8136758c>] ? 
__schedule+0x5a0/0x5cd
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff81125f3a>] ? 
wb_do_writeback+0x179/0x1de
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff81055ccb>] ? 
del_timer_sync+0x34/0x3e
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff81126062>] ? 
bdi_writeback_thread+0xc3/0x1fe
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff81125f9f>] ? 
wb_do_writeback+0x1de/0x1de
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff81125f9f>] ? 
wb_do_writeback+0x1de/0x1de
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff81063719>] ? 
kthread+0x7a/0x82
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff813701b4>] ? 
kernel_thread_helper+0x4/0x10
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff8106369f>] ? 
kthread_worker_fn+0x147/0x147
May 11 13:09:55 zapat2 kernel: [28657.961401]  [<ffffffff813701b0>] ? 
gs_change+0x13/0x13
May 11 13:09:55 zapat2 kernel: [28657.961401] Code: fe 48 89 df e8 53 fd ff ff 
83 7b 38 00 0f 84 93 00 00 00 e9 e8 00 00 00 49 8b 06 f6 c4 08 75 04 0f 0b eb 
fe 49 8b 46 30 49 89 c6 <49> 8b 16 80 e2 04 74 04 0f 0b eb fe 49 8b 16 80 e6 02 
75 0c 49 
May 11 13:09:55 zapat2 kernel: [28657.961401] RIP  [<ffffffffa012151a>] 
write_cache_pages_da+0x1fa/0x313 [ext4]
May 11 13:09:55 zapat2 kernel: [28657.961401]  RSP <ffff881e274719e0>
May 11 13:09:55 zapat2 kernel: [28658.046508] ---[ end trace 0255c3fa97ad2fe5 
]---
May 11 13:09:55 zapat2 kernel: [28658.048528] ------------[ cut here 
]------------
May 11 13:09:55 zapat2 kernel: [28658.049669] kernel BUG at 
/build/buildd-linux_3.2.41-2~bpo60+1-amd64-mOM5rH/linux-3.2.41/fs/jbd2/transaction.c:334!
May 11 13:09:55 zapat2 kernel: [28658.052006] invalid opcode: 0000 [#2] SMP 
May 11 13:09:55 zapat2 kernel: [28658.052006] CPU 0 
May 11 13:09:55 zapat2 kernel: [28658.052006] Modules linked in: openafs(P) 
nls_utf8 isofs des_generic cbc rpcsec_gss_krb5 nfsd nfs lockd fscache 
auth_rpcgss nfs_acl sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack xt_comment xt_multiport iptable_filter ip_tables x_tables ext2 
ib_ipoib mlx4_ib rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_sa ib_uverbs mlx4_en 
mlx4_core ib_umad ib_mad ib_core loop snd_pcm snd_timer snd soundcore 
snd_page_alloc crc32c_intel ghash_clmulni_intel aesni_intel psmouse i2c_piix4 
cryptd pcspkr aes_x86_64 i2c_core tpm_tis tpm i6300esb button tpm_bios evdev 
aes_generic processor thermal_sys serio_raw virtio_balloon ext4 mbcache jbd2 
crc16 dm_mod microcode virtio_net ata_generic virtio_blk virtio_pci floppy 
virtio_ring virtio uhci_hcd ata_piix ehci_hcd libata usbcore scsi_mod 
usb_common [last unloaded: scsi_wait_scan]
May 11 13:09:55 zapat2 kernel: [28658.052006] 
May 11 13:09:55 zapat2 kernel: [28658.052006] Pid: 18822, comm: flush-254:64 
Tainted: P      D    O 3.2.0-0.bpo.4-amd64 #1 Debian 3.2.41-2~bpo60+1 Bochs 
Bochs
May 11 13:09:55 zapat2 kernel: [28658.052006] RIP: 0010:[<ffffffffa00f5f4b>]  
[<ffffffffa00f5f4b>] jbd2__journal_start+0x40/0xce [jbd2]
May 11 13:09:55 zapat2 kernel: [28658.052006] RSP: 0000:ffff881e274713e8  
EFLAGS: 00010206
May 11 13:09:55 zapat2 kernel: [28658.052006] RAX: ffff880f060f4280 RBX: 
ffff880e59b46570 RCX: 0000000000000001
May 11 13:09:55 zapat2 kernel: [28658.052006] RDX: 0000000000000050 RSI: 
0000000000000002 RDI: ffff880f05308000
May 11 13:09:55 zapat2 kernel: [28658.052006] RBP: ffff881e27ee9120 R08: 
ffff881e27471428 R09: ffff881e27471828
May 11 13:09:55 zapat2 kernel: [28658.052006] R10: ffff881e27471fd8 R11: 
ffff881e27471798 R12: ffff880f05308000
May 11 13:09:55 zapat2 kernel: [28658.052006] R13: 0000000000000002 R14: 
0000000033a57950 R15: ffff881e274716b8
May 11 13:09:55 zapat2 kernel: [28658.052006] FS:  0000000000000000(0000) 
GS:ffff880f42c00000(0000) knlGS:0000000000000000
May 11 13:09:55 zapat2 kernel: [28658.052006] CS:  0010 DS: 0000 ES: 0000 CR0: 
000000008005003b
May 11 13:09:55 zapat2 kernel: [28658.052006] CR2: 00000000020f7068 CR3: 
0000000f060e1000 CR4: 00000000000406f0
May 11 13:09:55 zapat2 kernel: [28658.052006] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
May 11 13:09:55 zapat2 kernel: [28658.052006] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
May 11 13:09:55 zapat2 kernel: [28658.052006] Process flush-254:64 (pid: 18822, 
threadinfo ffff881e27470000, task ffff881e27ee9120)
May 11 13:09:55 zapat2 kernel: [28658.052006] Stack:
May 11 13:09:55 zapat2 kernel: [28658.052006]  ffffffff81794190 
ffffffff814b6351 0000000000000400 ffff880f05356800
May 11 13:09:55 zapat2 kernel: [28658.052006]  ffff880f05308000 
0000000000000002 ffffffffa011e990 ffffffffa01376f4
May 11 13:09:55 zapat2 kernel: [28658.052006]  ffffffff813701af 
ffffffff8107b0e5 0000000000000013 0000000000000013
May 11 13:09:55 zapat2 kernel: [28658.052006] Call Trace:
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffffa011e990>] ? 
ext4_dirty_inode+0x17/0x49 [ext4]
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffffa01376f4>] ? 
ext4_journal_start_sb+0x145/0x152 [ext4]
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff813701af>] ? 
gs_change+0x12/0x13
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff8107b0e5>] ? 
kallsyms_lookup+0x7e/0xaf
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffffa011e990>] ? 
ext4_dirty_inode+0x17/0x49 [ext4]
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81124f9d>] ? 
__mark_inode_dirty+0x22/0x1a7
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81119548>] ? 
file_update_time+0xd4/0xff
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff810bde7c>] ? 
__generic_file_aio_write+0x15b/0x277
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff810bdff7>] ? 
generic_file_aio_write+0x5f/0xb3
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffffa0119a3f>] ? 
ext4_file_write+0x1ea/0x249 [ext4]
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff811c2536>] ? 
put_dec+0x2e/0x33
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff811c2678>] ? 
number+0x13d/0x243
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff811c2678>] ? 
number+0x13d/0x243
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff811066d2>] ? 
do_sync_write+0xba/0xf3
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81081baf>] ? 
check_free_space+0x1f/0x139
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff8102d1b2>] ? 
pvclock_clocksource_read+0x46/0xb4
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff8106a692>] ? 
timekeeping_get_ns+0xd/0x2a
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81082043>] ? 
do_acct_process+0x37a/0x3bc
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff810820e9>] ? 
acct_process+0x64/0x7d
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff8104cf53>] ? 
do_exit+0x265/0x799
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81366df9>] ? 
printk+0x40/0x47
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81049ea6>] ? 
kmsg_dump+0x53/0xef
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff813698fd>] ? 
oops_end+0x65/0xb6
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81369949>] ? 
oops_end+0xb1/0xb6
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81369005>] ? 
general_protection+0x25/0x30
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffffa012151a>] ? 
write_cache_pages_da+0x1fa/0x313 [ext4]
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffffa0121a7a>] ? 
ext4_da_writepages+0x29f/0x45d [ext4]
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffffa003b058>] ? 
virtqueue_kick+0x9/0x19 [virtio_ring]
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff811254ec>] ? 
writeback_single_inode+0x178/0x35e
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff8112599e>] ? 
writeback_sb_inodes+0x169/0x1ff
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81125aa1>] ? 
__writeback_inodes_wb+0x6d/0xab
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81125cc7>] ? 
wb_writeback+0x128/0x222
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff8136758c>] ? 
__schedule+0x5a0/0x5cd
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81125f3a>] ? 
wb_do_writeback+0x179/0x1de
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81055ccb>] ? 
del_timer_sync+0x34/0x3e
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81126062>] ? 
bdi_writeback_thread+0xc3/0x1fe
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81125f9f>] ? 
wb_do_writeback+0x1de/0x1de
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81125f9f>] ? 
wb_do_writeback+0x1de/0x1de
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff81063719>] ? 
kthread+0x7a/0x82
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff813701b4>] ? 
kernel_thread_helper+0x4/0x10
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff8106369f>] ? 
kthread_worker_fn+0x147/0x147
May 11 13:09:55 zapat2 kernel: [28658.052006]  [<ffffffff813701b0>] ? 
gs_change+0x13/0x13
May 11 13:09:55 zapat2 kernel: [28658.052006] Code: 48 c7 c3 e2 ff ff ff 48 83 
ec 18 48 85 ff 48 8b 85 40 05 00 00 0f 84 90 00 00 00 48 85 c0 48 89 c3 74 11 
48 8b 00 48 39 38 74 04 <0f> 0b eb fe ff 43 0c eb 77 48 8b 3d 25 be 00 00 be 50 
00 00 00 
May 11 13:09:55 zapat2 kernel: [28658.052006] RIP  [<ffffffffa00f5f4b>] 
jbd2__journal_start+0x40/0xce [jbd2]
May 11 13:09:55 zapat2 kernel: [28658.052006]  RSP <ffff881e274713e8>
May 11 13:09:55 zapat2 kernel: [28658.182603] ---[ end trace 0255c3fa97ad2fe6 
]---
May 11 13:09:55 zapat2 kernel: [28658.183723] Fixing recursive fault but reboot 
is needed!

Reply via email to