Hi there,

I know some of the Spectrum Scale developers look at this list. I’m having a 
little trouble with support on this problem. 

We are seeing crashes with GPFS 5.0.4-1 Data Access Edition on KVM guests with 
a portability layer that has been installed via gpfs.gplbin RPMs that we built 
at our site and have used to install GPFS all over our environment. We’ve not 
seen this problem so far on any physical hosts, but have now experienced it on 
guests running on number of our KVM hypervisors, across vendors and firmware 
versions, etc. At one time I thought it was all happening on systems using 
Mellanox virtual functions for Infiniband, but we’ve now seen it on VMs without 
VFs. There may be an SELinux interaction, but some of our hosts have it 
disabled outright, some are Permissive, and some were working successfully with 
5.0.2.x GPFS. 

What I’ve been instructed to try to solve this problem has been to run 
“mmbuildgpl”, and it has solved the problem. I don’t consider running 
"mmbuildgpl" a real solution, however. If RPMs are a supported means of 
installation, it should work. Support told me that they’d seen this solve the 
problem at another site as well.

Does anyone have any more information about this problem/whether there’s a fix 
in the pipeline, or something that can be done to cause this problem that we 
could remedy? Is there an easy place to see a list of eFixes to see if this has 
come up? I know it’s very similar to a problem that happened I believe it was 
after 5.0.2.2 and Linux 3.10.0-957.19.1, but that was fixed already in 5.0.3.x.

Below is a sample of the crash output:

[  156.733477] kernel BUG at mm/slub.c:3772!
[  156.734212] invalid opcode: 0000 [#1] SMP
[  156.735017] Modules linked in: ebtable_nat ebtable_filter ebtable_broute 
bridge stp llc ebtables mmfs26(OE) mmfslinux(OE) tracedev(OE) rdma_ucm(OE) 
ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) 
mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) ip6table_nat 
nf_nat_ipv6 ip6table_mangle ip6table_raw nf_conntrack_ipv6 nf_defrag_ipv6 
ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle 
iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_multiport 
xt_conntrack nf_conntrack iptable_filter iptable_security nfit libnvdimm ppdev 
iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper 
ablk_helper sg joydev pcspkr cryptd parport_pc parport i2c_piix4 virtio_balloon 
knem(OE) binfmt_misc ip_tables xfs libcrc32c mlx5_ib(OE) ib_uverbs(OE) 
ib_core(OE) sr_mod cdrom ata_generic pata_acpi virtio_console virtio_net 
virtio_blk crct10dif_pclmul crct10dif_common mlx5_core(OE) mlxfw(OE) 
crc32c_intel ptp pps_core devlink ata_piix serio_raw mlx_compat(OE) libata 
virtio_pci floppy virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
[  156.754814] CPU: 3 PID: 11826 Comm: request_handle* Tainted: G           OE  
------------   3.10.0-1062.9.1.el7.x86_64 #1
[  156.756782] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
[  156.757978] task: ffff8aeca5bf8000 ti: ffff8ae9f7a24000 task.ti: 
ffff8ae9f7a24000
[  156.759326] RIP: 0010:[<ffffffffbbe23dec>]  [<ffffffffbbe23dec>] 
kfree+0x13c/0x140
[  156.760749] RSP: 0018:ffff8ae9f7a27278  EFLAGS: 00010246
[  156.761717] RAX: 001fffff00000400 RBX: ffffffffbc6974bf RCX: ffffa74dc1bcfb60
[  156.763030] RDX: 001fffff00000000 RSI: ffff8aed90fc6500 RDI: ffffffffbc6974bf
[  156.764321] RBP: ffff8ae9f7a27290 R08: 0000000000000014 R09: 0000000000000003
[  156.765612] R10: 0000000000000048 R11: ffffdb5a82d125c0 R12: ffffa74dc4fd36c0
[  156.766938] R13: ffffffffc0a1c562 R14: ffff8ae9f7a272f8 R15: ffff8ae9f7a27938
[  156.768229] FS:  00007f8ffff05700(0000) GS:ffff8aedbfd80000(0000) 
knlGS:0000000000000000
[  156.769708] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  156.770754] CR2: 000055963330e2b0 CR3: 0000000325ad2000 CR4: 00000000003606e0
[  156.772076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  156.773367] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  156.774663] Call Trace:
[  156.775154]  [<ffffffffc0a1c562>] cxiInitInodeSecurityCleanup+0x12/0x20 
[mmfslinux]
[  156.776568]  [<ffffffffc0b50562>] 
_Z17newInodeInitLinuxP15KernelOperationP13gpfsVfsData_tPP8OpenFilePPvPP10gpfsNode_tP7FileUIDS6_N5LkObj12LockModeEnumE+0x152/0x290
 [mmfs26]
[  156.779378]  [<ffffffffc0b5cdfa>] 
_Z9gpfsMkdirP13gpfsVfsData_tP15KernelOperationP9cxiNode_tPPvPS4_PyS5_PcjjjP10ext_cred_t+0x46a/0x7e0
 [mmfs26]
[  156.781689]  [<ffffffffc0bdb928>] ? 
_ZN14BaseMutexClass15releaseLockHeldEP16KernelSynchState+0x18/0x130 [mmfs26]
[  156.783565]  [<ffffffffc0c3db2d>] 
_ZL21pcacheHandleCacheMissP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcPyP12pCacheResp_tPS5_PS4_PjSA_j+0x4bd/0x760
 [mmfs26]
[  156.786228]  [<ffffffffc0c40675>] 
_Z12pcacheLookupP13gpfsVfsData_tP15KernelOperationP10gpfsNode_tPvPcP7FilesetjjjPS5_PS4_PyPjS9_+0x1ff5/0x21a0
 [mmfs26]
[  156.788681]  [<ffffffffc0c023ef>] ? 
_Z15findFilesetByIdP15KernelOperationjjPP7Filesetj+0x4f/0xa0 [mmfs26]
[  156.790448]  [<ffffffffc0b6d59c>] 
_Z10gpfsLookupP13gpfsVfsData_tPvP9cxiNode_tS1_S1_PcjPS1_PS3_PyP10cxiVattr_tPjP10ext_cred_tjS5_PiS4_SD_+0x65c/0xad0
 [mmfs26]
[  156.793032]  [<ffffffffc0b8b022>] ? 
_Z33gpfsIsCifsBypassTraversalCheckingv+0xe2/0x130 [mmfs26]
[  156.794588]  [<ffffffffc0a36d96>] gpfs_i_lookup+0x2e6/0x5a0 [mmfslinux]
[  156.795838]  [<ffffffffc0b6cf40>] ? 
_Z8gpfsLinkP13gpfsVfsData_tP9cxiNode_tS2_PvPcjjP10ext_cred_t+0x6c0/0x6c0 
[mmfs26]
[  156.797753]  [<ffffffffbbe65d52>] ? __d_alloc+0x122/0x180
[  156.798763]  [<ffffffffbbe65e10>] ? d_alloc+0x60/0x70
[  156.799700]  [<ffffffffbbe556d3>] lookup_real+0x23/0x60
[  156.800651]  [<ffffffffbbe560f2>] __lookup_hash+0x42/0x60
[  156.801675]  [<ffffffffbc377874>] lookup_slow+0x42/0xa7
[  156.802634]  [<ffffffffbbe5ac3f>] link_path_walk+0x80f/0x8b0
[  156.803666]  [<ffffffffbbe5ae4a>] path_lookupat+0x7a/0x8b0
[  156.804690]  [<ffffffffbbdcd2fe>] ? lru_cache_add+0xe/0x10
[  156.805690]  [<ffffffffbbe24ef5>] ? kmem_cache_alloc+0x35/0x1f0
[  156.806766]  [<ffffffffbbe5c45f>] ? getname_flags+0x4f/0x1a0
[  156.807817]  [<ffffffffbbe5b6ab>] filename_lookup+0x2b/0xc0
[  156.808834]  [<ffffffffbbe5d5f7>] user_path_at_empty+0x67/0xc0
[  156.809923]  [<ffffffffbbdf3ecd>] ? handle_mm_fault+0x39d/0x9b0
[  156.811017]  [<ffffffffbbe5d661>] user_path_at+0x11/0x20
[  156.811983]  [<ffffffffbbe50343>] vfs_fstatat+0x63/0xc0
[  156.812951]  [<ffffffffbbe506fe>] SYSC_newstat+0x2e/0x60
[  156.813931]  [<ffffffffbc388a26>] ? trace_do_page_fault+0x56/0x150
[  156.815050]  [<ffffffffbbe50bbe>] SyS_newstat+0xe/0x10
[  156.816010]  [<ffffffffbc38dede>] system_call_fastpath+0x25/0x2a
[  156.817104] Code: 49 8b 03 31 f6 f6 c4 40 74 04 41 8b 73 68 4c 89 df e8 89 
2f fa ff eb 84 4c 8b 58 30 48 8b 10 80 e6 80 4c 0f 44 d8 e9 28 ff ff ff <0f> 0b 
66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
[  156.822192] RIP  [<ffffffffbbe23dec>] kfree+0x13c/0x140
[  156.823180]  RSP <ffff8ae9f7a27278>
[  156.823872] ---[ end trace 142960be4a4feed8 ]---
[  156.824806] Kernel panic - not syncing: Fatal exception
[  156.826475] Kernel Offset: 0x3ac00000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - [email protected]
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to