Hello,

We currently have a ceph cluster supporting an Openshift cluster using
cephfs and dynamic rbd provisioning. The client nodes appear to be
triggering a kernel bug and are rebooting unexpectedly with the same
message each time. Clients are running CentOS 7:

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-514.10.2.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2017-05-02-09:06:17/vmcore  [PARTIAL
DUMP]
        CPUS: 16
        DATE: Tue May  2 09:06:15 2017
      UPTIME: 00:43:14
LOAD AVERAGE: 1.52, 1.40, 1.48
       TASKS: 7408
    NODENAME: [redacted]
     RELEASE: 3.10.0-514.10.2.el7.x86_64
     VERSION: #1 SMP Fri Mar 3 00:04:05 UTC 2017
     MACHINE: x86_64  (1997 Mhz)
      MEMORY: 32 GB
       PANIC: "kernel BUG at fs/ceph/inode.c:1197!"
         PID: 133
     COMMAND: "kworker/1:1"
        TASK: ffff8801399bde20  [THREAD_INFO: ffff880138d0c000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

[ 2596.061470] ------------[ cut here ]------------
[ 2596.061499] kernel BUG at fs/ceph/inode.c:1197!
[ 2596.061516] invalid opcode: 0000 [#1] SMP
[ 2596.061535] Modules linked in: cfg80211 rfkill binfmt_misc veth ext4
mbcache jbd2 rbd xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4
xt_mark ipt_MASQUERADE nf_nat_masquerad
e_ipv4 xt_addrtype br_netfilter bridge stp llc dm_thin_pool
dm_persistent_data dm_bio_prison dm_bufio loop fuse ceph libceph
dns_resolver vport_vxlan vxlan ip6_udp_tunnel udp_tunnel op
envswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 iptable_nat
nf_nat_ipv4 nf_nat xt_limit nf_log_ipv4 vmw_vsock_vmci_transport
nf_log_common xt_LOG vsock nf_conntrack_ipv4 nf_defr
ag_ipv4 xt_comment xt_multiport xt_conntrack nf_conntrack iptable_filter
intel_powerclamp coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel
aesni_intel lrw gf128mul glue_helper ablk_h
elper cryptd ppdev vmw_balloon pcspkr sg vmw_vmci shpchp i2c_piix4
parport_pc
[ 2596.061875]  parport nfsd nfs_acl lockd auth_rpcgss grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic
ata_generic pata_acpi vmwgfx drm_kms_helper
 syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul
crct10dif_common mptspi crc32c_intel drm ata_piix scsi_transport_spi
serio_raw mptscsih libata mptbase vmxnet3 i2c_c
ore fjes dm_mirror dm_region_hash dm_log dm_mod
[ 2596.062042] CPU: 1 PID: 133 Comm: kworker/1:1 Not tainted
3.10.0-514.10.2.el7.x86_64 #1
[ 2596.062070] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 09/17/2015
[ 2596.062118] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[ 2596.062140] task: fffdf8801399be20 ti: ffff880138d0c000 task.ti:
ffff880138d0c000
[ 2596.062166] RIP: 0010:[<ffffffffa05d96c3>]  [<ffffffffa05d96c3>]
ceph_fill_trace+0x893/0xa00 [ceph]
[ 2596.062209] RSP: 0000:ffff880138d0fb80  EFLAGS: 00010287
[ 2596.062230] RAX: ffff88083b079680 RBX: ffff8801efe86760 RCX:
ffff880095e26c00
[ 2596.062257] RDX: ffff880003e8f2c0 RSI: ffff88053b4c0a08 RDI:
ffff88053b4c0a00
[ 2596.062288] RBP: ffff880138d0fbf8 R08: ffff880003e8f2c0 R09:
0000000000000000
[ 2596.062320] R10: 0000000000000001 R11: ffff8804256f3ac0 R12:
ffff880121d15400
[ 2596.062351] R13: ffff880138dd4000 R14: ffff88007053f280 R15:
ffff8807ee10f2c0
[ 2596.062379] FS:  0000000000000000(0000) GS:ffff88013b840000(0000)
knlGS:0000000000000000
[ 2596.062413] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2596.062436] CR2: 00007fe3bab2dcd0 CR3: 000000042ebe0000 CR4:
00000000001407e0
[ 2596.062498] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2596.062540] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 2596.062567] Stack:
[ 2596.062578]  ffff880121d15778 ffff880121d15718 ffff880138d0fc50
ffff880095e26e7a
[ 2596.062612]  ffff880035c12400 ffff88053b4c7800 000000003b4c0800
ffff880138d0fbb8
[ 2596.062645]  ffff880138d0fbb8 00000000a5446715 ffff88053b4c0800
ffff88008238ee10
[ 2596.062681] Call Trace:
[ 2596.062703]  [<ffffffffa05f96a8>] handle_reply+0x3e8/0xc80 [ceph]
[ 2596.062736]  [<ffffffffa05fbd39>] dispatch+0xd9/0xaf0 [ceph]
[ 2596.062762]  [<ffffffff815559ca>] ? kernel_recvmsg+0x3a/0x50
[ 2596.062790]  [<ffffffffa057ceff>] try_read+0x4bf/0x1220 [libceph]
[ 2596.062819]  [<ffffffffa057b743>] ? try_write+0xa13/0xe60 [libceph]
[ 2596.062851]  [<ffffffffa057dd19>] ceph_con_workfn+0xb9/0x650 [libceph]
[ 2596.062878]  [<ffffffff810a810b>] process_one_work+0x17b/0x470
[ 2596.062902]  [<ffffffff810a8f46>] worker_thread+0x126/0x410
[ 2596.062925]  [<ffffffff810a8e20>] ? rescuer_thread+0x460/0x460
[ 2596.062949]  [<ffffffff810b06ff>] kthread+0xcf/0xe0
[ 2596.064014]  [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140
[ 2596.065010]  [<ffffffff81696a58>] ret_from_fork+0x58/0x90
[ 2596.065955]  [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140
[ 2596.066945] Code: e8 c3 2b d6 e0 e9 ca fa ff ff 4c 89 fa 48 c7 c6 07
d0 60 a0 48 c7 c7 50 24 61 a0 31 c0 e8 a6 2b d6 e0 e9 cd fa ff ff 0f 0b
0f 0b <0f> 0b 0f 0b 48 8b 83 c8 fc ff ff
 4c 8b 89 c8 fc ff ff 4c 89 fa
[ 2596.069127] RIP  [<ffffffffa05d96c3>] ceph_fill_trace+0x893/0xa00 [ceph]
[ 2596.070120]  RSP <ffff880138d0fb80>


Just before the above there are lots of messages similar to this from
all ceph node ips:
[  933.282441] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
MAC=00:50:56:0f:9a:47:00:50:56:35:28:f1:08:00 SRC=192.168.5.6
DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=20778 DF P
ROTO=TCP SPT=6816 DPT=47140 WINDOW=2406 RES=0x00 ACK FIN URGP=0
[  933.922440] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
MAC=00:50:56:0f:9a:47:00:50:56:35:28:f1:08:00 SRC=192.168.5.6
DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=1440 DF PR
OTO=TCP SPT=6800 DPT=56290 WINDOW=2889 RES=0x00 ACK FIN URGP=0
[  934.031555] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
MAC=00:50:56:0f:9a:47:00:50:56:26:f3:39:08:00 SRC=192.168.5.7
DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=58232 DF P
ROTO=TCP SPT=6812 DPT=59564 WINDOW=8433 RES=0x00 ACK FIN URGP=0
[  934.031579] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
MAC=00:50:56:0f:9a:47:00:50:56:26:f3:39:08:00 SRC=192.168.5.7
DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=20084 DF P
ROTO=TCP SPT=6816 DPT=55574 WINDOW=2925 RES=0x00 ACK FIN URGP=0
[  934.105440] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
MAC=00:50:56:0f:9a:47:00:50:56:37:f8:4c:08:00 SRC=192.168.5.4
DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=48428 DF P
ROTO=TCP SPT=6804 DPT=59156 WINDOW=6422 RES=0x00 ACK FIN URGP=0
[  935.133060] [IPTABLES:INPUT] dropped IN=eno33557248 OUT=
MAC=00:50:56:0f:9a:47:00:50:56:0d:13:27:08:00 SRC=192.168.5.3
DST=192.168.3.2 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=35384 DF P
ROTO=TCP SPT=6817 DPT=52674 WINDOW=24576 RES=0x00 ACK FIN URGP=0

Many thanks

James
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to