Hello #linux-cluster David's guide was quite helpful to attempt to run GFS2 with a minimal amount of cluster setup, and on top of a DRBD mirror (http://people.redhat.com/teigland/cluster4-gfs2-dlm.txt).
Instead of using RHEL, I built all the required pieces from scratch on Slackware Linux current (Aug 2020): drbd v9, drbd-utils, libqb, kronosnet, corosync and userland dlm (https://pagure.io/dlm). DRBD node names are `pro5s1` and `pro5s2` and Corosync/DLM/GFS2 cluster name is `pro5s`. It seems I found a bug. Your opinion, and advice on how to deal with it would be much appreciated. Thank you. Please just don't tell me to switch to RHEL or CentOS. I suppose those fine pieces of source are supposed to work anywhere. Besides, I hope this kind of feedback may be interesting to you. Suspected issues: either the fact OCFS2 is also mentioned in dmesg (?!), or some network-related discrepancy as the cluster network goes through some VLAN I do not have control of. uname -a pro5s1: Linux pro5s1 5.4.58 #1 SMP Tue Aug 11 14:42:19 CDT 2020 x86_64 Intel(R) Xeon(R) CPU E3-1240 v6 @ 3.70GHz GenuineIntel GNU/Linux pro5s2: Linux pro5s2 5.4.58 #1 SMP Tue Aug 11 14:42:19 CDT 2020 x86_64 Intel(R) Xeon(R) CPU E3-1240 v6 @ 3.70GHz GenuineIntel GNU/Linux pro5s1: BOOT_IMAGE=/boot/vmlinuz root=/dev/sda1 ro mitigations=off vga=791 pro5s2: BOOT_IMAGE=/boot/vmlinuz root=/dev/sda1 ro mitigations=off vga=791 dmesg | grep -E 'drbd.*Version'" pro5s1: [ 11.735205] drbd: initialized. Version: 9.0.24-1 (api:2/proto:86-117) pro5s2: [ 11.753043] drbd: initialized. Version: 9.0.24-1 (api:2/proto:86-117) drbdadm -V pro5s1: DRBDADM_BUILDTAG=GIT-hash:\ a513dea1cf000164fd87e56525d098a426131a86\ build\ by\ root@pro5s1\,\ 2020-08-17\ 06:59:43 pro5s1: DRBDADM_API_VERSION=2 pro5s1: DRBD_KERNEL_VERSION_CODE=0x090018 pro5s1: DRBD_KERNEL_VERSION=9.0.24 pro5s1: DRBDADM_VERSION_CODE=0x090d00 pro5s1: DRBDADM_VERSION=9.13.0 pro5s2: DRBDADM_BUILDTAG=GIT-hash:\ a513dea1cf000164fd87e56525d098a426131a86\ build\ by\ root@pro5s2\,\ 2020-08-17\ 07:00:55 pro5s2: DRBDADM_API_VERSION=2 pro5s2: DRBD_KERNEL_VERSION_CODE=0x090018 pro5s2: DRBD_KERNEL_VERSION=9.0.24 pro5s2: DRBDADM_VERSION_CODE=0x090d00 pro5s2: DRBDADM_VERSION=9.13.0 cat /proc/cmdline zcat /proc/config.gz | grep -E 'GFS2|DLM' pro5s1: CONFIG_GFS2_FS=m pro5s1: CONFIG_GFS2_FS_LOCKING_DLM=y pro5s1: CONFIG_DLM=m pro5s1: # CONFIG_DLM_DEBUG is not set pro5s2: CONFIG_GFS2_FS=m pro5s2: CONFIG_GFS2_FS_LOCKING_DLM=y pro5s2: CONFIG_DLM=m pro5s2: # CONFIG_DLM_DEBUG is not set dmesg | grep -Ei 'gfs2|dlm' pro5s1: [ 0.449985] OCFS2 User DLM kernel interface loaded pro5s1: [ 11.080928] DLM installed pro5s1: [ 11.087535] gfs2: GFS2 installed pro5s2: [ 0.447813] OCFS2 User DLM kernel interface loaded pro5s2: [ 57.100751] DLM installed pro5s2: [ 57.107343] gfs2: GFS2 installed ---> I am not sure why OCFS2 is mentioned here. There are a few modules available indeed, /lib/modules/5.4.58/kernel/fs/dlm /lib/modules/5.4.58/kernel/fs/dlm/dlm.ko /lib/modules/5.4.58/kernel/fs/ocfs2/dlm /lib/modules/5.4.58/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko /lib/modules/5.4.58/kernel/fs/ocfs2/dlmfs /lib/modules/5.4.58/kernel/fs/ocfs2/dlmfs/ocfs2_dlmfs.ko but as far as I know it's fs/dlm/dlm.ko that gets loaded, not ocfs2_dlm.ko. corosync -v pro5s1: Corosync Cluster Engine, version '3.0.4' pro5s1: Copyright (c) 2006-2018 Red Hat, Inc. pro5s2: Corosync Cluster Engine, version '3.0.4' pro5s2: Copyright (c) 2006-2018 Red Hat, Inc. for happy-happy testing: cat /etc/dlm/dlm.conf pro5s1: enable_fencing=0 pro5s2: enable_fencing=0 lsmod | grep -E 'dlm|gfs2|drbd' pro5s1: drbd_transport_tcp 28672 3 pro5s1: drbd 638976 4 drbd_transport_tcp pro5s1: gfs2 503808 0 pro5s1: dlm 196608 9 gfs2 pro5s2: gfs2 503808 0 pro5s2: dlm 196608 9 gfs2 pro5s2: drbd_transport_tcp 24576 3 pro5s2: drbd 544768 4 drbd_transport_tcp no need to have the second node as primary yet (dual primaries are enabled, so it could be used if needed, but we're not that far yet): drbdadm status res-data pro5s1: res-data role:Primary pro5s1: disk:UpToDate pro5s1: pro5s2 role:Secondary pro5s1: peer-disk:UpToDate pro5s1: pro5s2: res-data role:Secondary pro5s2: disk:UpToDate pro5s2: pro5s1 role:Primary pro5s2: peer-disk:UpToDate pro5s2: pgrep -a dlm pro5s1: 172 user_dlm pro5s1: 1038 /usr/sbin/dlm_controld pro5s1: 1039 /usr/sbin/dlm_controld pro5s2: 172 user_dlm pro5s2: 1033 /usr/sbin/dlm_controld pro5s2: 1042 /usr/sbin/dlm_controld corosync-quorumtool Quorum information ------------------ Date: Mon Aug 17 07:27:25 2020 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 1 Ring ID: 1.109 Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 1 Flags: 2Node Quorate WaitForAll Membership information ---------------------- Nodeid Votes Name 1 1 pro5s1 (local) 2 1 pro5s2 Now mounting GFS2 pro5s1# mount -t gfs2 /dev/drbd0 /data pro5s1# ll /data total 8.0K drwxr-xr-x 2 root root 3.8K Aug 11 20:27 ./ drwxr-xr-x 24 root root 4.0K Aug 17 06:44 ../ logs are fine so far: ==> /var/log/messages <== Aug 17 07:31:24 pro5s1 kernel: [ 1559.513131] gfs2: fsid=pro5s:data: Trying to join cluster "lock_dlm", "pro5s:data" Aug 17 07:31:24 pro5s1 kernel: [ 1559.513535] dlm: data: joining the lockspace group... Aug 17 07:31:24 pro5s1 kernel: [ 1559.516905] dlm: data: dlm_recover 1 Aug 17 07:31:24 pro5s1 kernel: [ 1559.516924] dlm: data: add member 1 Aug 17 07:31:24 pro5s1 kernel: [ 1559.516925] dlm: data: dlm_recover_members 1 nodes Aug 17 07:31:24 pro5s1 kernel: [ 1559.516926] dlm: data: generation 1 slots 1 1:1 Aug 17 07:31:24 pro5s1 kernel: [ 1559.516926] dlm: data: dlm_recover_directory Aug 17 07:31:24 pro5s1 kernel: [ 1559.516927] dlm: data: dlm_recover_directory 0 in 0 new Aug 17 07:31:24 pro5s1 kernel: [ 1559.516928] dlm: data: dlm_recover_directory 0 out 0 messages Aug 17 07:31:24 pro5s1 kernel: [ 1559.516930] dlm: data: group event done 0 0 Aug 17 07:31:24 pro5s1 kernel: [ 1559.516931] dlm: data: join complete Aug 17 07:31:24 pro5s1 kernel: [ 1559.516937] dlm: data: dlm_recover 1 generation 1 done: 0 ms Aug 17 07:31:25 pro5s1 kernel: [ 1560.018339] gfs2: fsid=pro5s:data: first mounter control generation 0 Aug 17 07:31:25 pro5s1 kernel: [ 1560.018340] gfs2: fsid=pro5s:data: Joined cluster. Now mounting FS... Aug 17 07:31:25 pro5s1 kernel: [ 1560.024737] gfs2: fsid=pro5s:data.0: journal 0 mapped with 1 extents in 0ms Aug 17 07:31:25 pro5s1 kernel: [ 1560.024741] gfs2: fsid=pro5s:data.0: jid=0, already locked for use Aug 17 07:31:25 pro5s1 kernel: [ 1560.024741] gfs2: fsid=pro5s:data.0: jid=0: Looking at journal... Aug 17 07:31:25 pro5s1 kernel: [ 1560.064018] gfs2: fsid=pro5s:data.0: jid=0: Journal head lookup took 39ms Aug 17 07:31:25 pro5s1 kernel: [ 1560.064077] gfs2: fsid=pro5s:data.0: jid=0: Done Aug 17 07:31:25 pro5s1 kernel: [ 1560.064119] gfs2: fsid=pro5s:data.0: jid=1: Trying to acquire journal lock... Aug 17 07:31:25 pro5s1 kernel: [ 1560.064343] gfs2: fsid=pro5s:data.0: jid=1: Looking at journal... Aug 17 07:31:25 pro5s1 kernel: [ 1560.066188] gfs2: fsid=pro5s:data.0: journal 1 mapped with 1 extents in 0ms Aug 17 07:31:25 pro5s1 kernel: [ 1560.114308] gfs2: fsid=pro5s:data.0: jid=1: Journal head lookup took 49ms Aug 17 07:31:25 pro5s1 kernel: [ 1560.114338] gfs2: fsid=pro5s:data.0: jid=1: Done Aug 17 07:31:25 pro5s1 kernel: [ 1560.114352] gfs2: fsid=pro5s:data.0: first mount done, others may mount ==> /var/log/syslog <== Aug 17 07:31:24 pro5s1 kernel: [ 1559.513371] dlm: Using TCP for communications trying to write something: pro5s1# echo ok > /data/file-check Oops, here comes the bug ==> /var/log/messages <== Aug 17 07:32:30 pro5s1 kernel: [ 1625.215500] PGD 8435be067 P4D 8435be067 PUD 844829067 PMD 0 ==> /var/log/syslog <== Aug 17 07:32:30 pro5s1 kernel: [ 1625.214399] BUG: kernel NULL pointer dereference, address: 0000000000000008 Aug 17 07:32:30 pro5s1 kernel: [ 1625.214774] #PF: supervisor write access in kernel mode Aug 17 07:32:30 pro5s1 kernel: [ 1625.215140] #PF: error_code(0x0002) - not-present page Aug 17 07:32:30 pro5s1 kernel: [ 1625.215860] Oops: 0002 [#1] SMP NOPTI Aug 17 07:32:30 pro5s1 kernel: [ 1625.216218] CPU: 0 PID: 1060 Comm: bash Tainted: G O 5.4.58 #1 Aug 17 07:32:30 pro5s1 kernel: [ 1625.216583] Hardware name: Quanta Cloud Technology Inc. QuantaMicro X10E-9N/S3E-MB, BIOS S3E_3B09.02 02/23/2018 Aug 17 07:32:30 pro5s1 kernel: [ 1625.216974] RIP: 0010:gfs2_log_commit+0xf4/0x3f0 [gfs2] Aug 17 07:32:30 pro5s1 kernel: [ 1625.217358] Code: 48 89 45 60 48 8d bb ec 08 00 00 e8 16 12 94 dd 48 8b 55 70 48 8d 45 70 48 39 d0 74 29 49 8b 4c 24 78 48 8b 75 70 48 8b 55 78 <48> 89 4e 08 48 89 31 49 8d 4c 24 70 48 89 0a 49 89 54 24 78 48 89 Aug 17 07:32:30 pro5s1 kernel: [ 1625.218155] RSP: 0018:ffffb406403f7b38 EFLAGS: 00010282 Aug 17 07:32:30 pro5s1 kernel: [ 1625.218549] RAX: ffffa0bc0341bbb0 RBX: ffffa0bc0abc6000 RCX: 0000000000000000 Aug 17 07:32:30 pro5s1 kernel: [ 1625.218947] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa0bc0abc68ec Aug 17 07:32:30 pro5s1 kernel: [ 1625.219346] RBP: ffffa0bc0341bb40 R08: 0000000000000001 R09: ffffa0bc00b235d0 Aug 17 07:32:30 pro5s1 kernel: [ 1625.219742] R10: 78445afabba0ffff R11: 0000000000000001 R12: ffffa0bc0341b840 Aug 17 07:32:30 pro5s1 kernel: [ 1625.220136] R13: ffffa0bc0abc6878 R14: ffffa0bc00adb9a8 R15: ffffa0bc0abc6000 Aug 17 07:32:30 pro5s1 kernel: [ 1625.220531] FS: 00007f46af6ad740(0000) GS:ffffa0bc0fa00000(0000) knlGS:0000000000000000 Aug 17 07:32:30 pro5s1 kernel: [ 1625.220963] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 17 07:32:30 pro5s1 kernel: [ 1625.221361] CR2: 0000000000000008 CR3: 00000008448aa002 CR4: 00000000003606f0 Aug 17 07:32:30 pro5s1 kernel: [ 1625.221774] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 17 07:32:30 pro5s1 kernel: [ 1625.222168] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Aug 17 07:32:30 pro5s1 kernel: [ 1625.222551] Call Trace: Aug 17 07:32:30 pro5s1 kernel: [ 1625.222931] gfs2_trans_end+0x7d/0x160 [gfs2] Aug 17 07:32:30 pro5s1 kernel: [ 1625.223305] gfs2_create_inode+0xb5e/0x1330 [gfs2] Aug 17 07:32:30 pro5s1 kernel: [ 1625.223670] ? gfs2_create_inode+0x103/0x1330 [gfs2] Aug 17 07:32:30 pro5s1 kernel: [ 1625.224028] ? gfs2_create_inode+0x9dc/0x1330 [gfs2] Aug 17 07:32:30 pro5s1 kernel: [ 1625.224383] gfs2_atomic_open+0x56/0xe0 [gfs2] Aug 17 07:32:30 pro5s1 kernel: [ 1625.224737] path_openat+0x8ea/0x1570 Aug 17 07:32:30 pro5s1 kernel: [ 1625.225106] do_filp_open+0x91/0x100 Aug 17 07:32:30 pro5s1 kernel: [ 1625.225436] ? __check_object_size+0x136/0x147 Aug 17 07:32:30 pro5s1 kernel: [ 1625.225764] do_sys_open+0x184/0x220 Aug 17 07:32:30 pro5s1 kernel: [ 1625.226094] do_syscall_64+0x4c/0x170 Aug 17 07:32:30 pro5s1 kernel: [ 1625.226426] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Aug 17 07:32:30 pro5s1 kernel: [ 1625.226767] RIP: 0033:0x7f46af7b65d7 Aug 17 07:32:30 pro5s1 kernel: [ 1625.227105] Code: 25 00 00 41 00 3d 00 00 41 00 74 37 64 8b 04 25 18 00 00 00 85 c0 75 5b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 85 00 00 00 48 83 c4 68 5d 41 5c c3 0f 1f Aug 17 07:32:30 pro5s1 kernel: [ 1625.227812] RSP: 002b:00007fff93fb0220 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 Aug 17 07:32:30 pro5s1 kernel: [ 1625.228170] RAX: ffffffffffffffda RBX: 0000000001889f08 RCX: 00007f46af7b65d7 Aug 17 07:32:30 pro5s1 kernel: [ 1625.228536] RDX: 0000000000000241 RSI: 000000000181ad48 RDI: 00000000ffffff9c Aug 17 07:32:30 pro5s1 kernel: [ 1625.228957] RBP: 000000000181ad48 R08: 0000000000000000 R09: 0000000000000020 Aug 17 07:32:30 pro5s1 kernel: [ 1625.229319] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000241 Aug 17 07:32:30 pro5s1 kernel: [ 1625.229683] R13: 0000000000000000 R14: 0000000000000001 R15: 000000000181ad48 Aug 17 07:32:30 pro5s1 kernel: [ 1625.230048] Modules linked in: drbd_transport_tcp(O) drbd(O) bridge 8021q garp mrp stp llc ipv6 nf_defrag_ipv6 gfs2 dlm coretemp intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp kvm_intel joydev kvm hid_generic irqbypass ast usbhid drm_vram_helper ttm crct10dif_pclmul crc32_pclmul hid drm_kms_helper ghash_clmulni_intel evdev drm rapl agpgart ipmi_ssif fb_sys_fops dm_thin_pool igb intel_cstate syscopyarea dm_persistent_data dca sysfillrect i2c_i801 i2c_algo_bit dm_bio_prison sysimgblt dm_bufio i2c_core ipmi_si ipmi_devintf xhci_pci mei_me ipmi_msghandler acpi_power_meter pinctrl_sunrisepoint intel_pch_thermal xhci_hcd pinctrl_intel mei hwmon video acpi_pad thermal fan button loop Aug 17 07:32:30 pro5s1 kernel: [ 1625.232548] CR2: 0000000000000008 Aug 17 07:32:30 pro5s1 kernel: [ 1625.233834] ---[ end trace ea2a9d0e210031cf ]--- Aug 17 07:32:30 pro5s1 kernel: [ 1625.236334] RIP: 0010:gfs2_log_commit+0xf4/0x3f0 [gfs2] Aug 17 07:32:30 pro5s1 kernel: [ 1625.236808] Code: 48 89 45 60 48 8d bb ec 08 00 00 e8 16 12 94 dd 48 8b 55 70 48 8d 45 70 48 39 d0 74 29 49 8b 4c 24 78 48 8b 75 70 48 8b 55 78 <48> 89 4e 08 48 89 31 49 8d 4c 24 70 48 89 0a 49 89 54 24 78 48 89 Aug 17 07:32:30 pro5s1 kernel: [ 1625.237715] RSP: 0018:ffffb406403f7b38 EFLAGS: 00010282 Aug 17 07:32:30 pro5s1 kernel: [ 1625.238156] RAX: ffffa0bc0341bbb0 RBX: ffffa0bc0abc6000 RCX: 0000000000000000 Aug 17 07:32:30 pro5s1 kernel: [ 1625.238597] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa0bc0abc68ec Aug 17 07:32:30 pro5s1 kernel: [ 1625.239032] RBP: ffffa0bc0341bb40 R08: 0000000000000001 R09: ffffa0bc00b235d0 Aug 17 07:32:30 pro5s1 kernel: [ 1625.239455] R10: 78445afabba0ffff R11: 0000000000000001 R12: ffffa0bc0341b840 Aug 17 07:32:30 pro5s1 kernel: [ 1625.239872] R13: ffffa0bc0abc6878 R14: ffffa0bc00adb9a8 R15: ffffa0bc0abc6000 Aug 17 07:32:30 pro5s1 kernel: [ 1625.240287] FS: 00007f46af6ad740(0000) GS:ffffa0bc0fa00000(0000) knlGS:0000000000000000 Aug 17 07:32:30 pro5s1 kernel: [ 1625.240706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 17 07:32:30 pro5s1 kernel: [ 1625.241157] CR2: 0000000000000008 CR3: 00000008448aa002 CR4: 00000000003606f0 Aug 17 07:32:30 pro5s1 kernel: [ 1625.241584] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 17 07:32:30 pro5s1 kernel: [ 1625.242007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ==> /var/log/wtmp <== "pts/0:_r� after that there either one or two cores (out of 8) at 100% while corosync is complaining: ==> /var/log/messages <== Aug 17 07:32:56 pro5s1 corosync[1022]: [KNET ] link: host: 2 link: 0 is down Aug 17 07:32:56 pro5s1 corosync[1022]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) ==> /var/log/syslog <== Aug 17 07:32:56 pro5s1 corosync[1022]: [KNET ] host: host: 2 has no active links ==> /var/log/messages <== Aug 17 07:32:56 pro5s1 last message buffered 1 times Aug 17 07:32:56 pro5s1 corosync[1022]: [TOTEM ] Token has not been received in 750 ms Aug 17 07:32:56 pro5s1 corosync[1022]: [TOTEM ] A processor failed, forming new configuration. Aug 17 07:32:58 pro5s1 corosync[1022]: [TOTEM ] A new membership (1.10d) was formed. Members left: 2 Aug 17 07:32:58 pro5s1 corosync[1022]: [TOTEM ] Failed to receive the leave message. failed: 2 Aug 17 07:32:58 pro5s1 corosync[1022]: [QUORUM] Members[1]: 1 Aug 17 07:32:58 pro5s1 corosync[1022]: [MAIN ] Completed service synchronization, ready to provide service. ==> /var/log/syslog <== Aug 17 07:32:56 pro5s1 last message buffered 1 times Aug 17 07:32:58 pro5s1 kernel: [ 1653.007070] dlm: closing connection to node 2 ==> /var/log/messages <== Aug 17 07:33:53 pro5s1 kernel: [ 1708.732408] drbd res-sabotage pro5s2: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown ) Aug 17 07:33:53 pro5s1 kernel: [ 1708.732845] drbd res-sabotage/0 drbd1 pro5s2: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) Aug 17 07:33:53 pro5s1 kernel: [ 1708.733297] drbd res-sabotage pro5s2: ack_receiver terminated Aug 17 07:33:53 pro5s1 kernel: [ 1708.733753] drbd res-sabotage pro5s2: Terminating ack_recv thread Aug 17 07:33:53 pro5s1 kernel: [ 1708.750928] drbd res-sabotage pro5s2: sock was shut down by peer ==> /var/log/syslog <== Aug 17 07:33:53 pro5s1 kernel: [ 1708.731959] drbd res-sabotage pro5s2: PingAck did not arrive in time. and there is another round of output: ==> /var/log/messages <== Aug 17 07:33:56 pro5s1 kernel: [ 1710.944069] Sending NMI from CPU 2 to CPUs 6: ==> /var/log/syslog <== Aug 17 07:33:56 pro5s1 kernel: [ 1710.919936] rcu: INFO: rcu_sched self-detected stall on CPU Aug 17 07:33:56 pro5s1 kernel: [ 1710.920537] rcu: 2-....: (59999 ticks this GP) idle=8fe/1/0x4000000000000002 softirq=9205/9205 fqs=15000 last_accelerate: 9bb3/8684, Nonlazy posted: .LD Aug 17 07:33:56 pro5s1 kernel: [ 1710.921688] (t=60001 jiffies g=443285 q=4488) Aug 17 07:33:56 pro5s1 kernel: [ 1710.922261] NMI backtrace for cpu 2 Aug 17 07:33:56 pro5s1 kernel: [ 1710.922837] CPU: 2 PID: 1188 Comm: gfs2_quotad Tainted: G D O 5.4.58 #1 Aug 17 07:33:56 pro5s1 kernel: [ 1710.923432] Hardware name: Quanta Cloud Technology Inc. QuantaMicro X10E-9N/S3E-MB, BIOS S3E_3B09.02 02/23/2018 Aug 17 07:33:56 pro5s1 kernel: [ 1710.924067] Call Trace: Aug 17 07:33:56 pro5s1 kernel: [ 1710.924709] <IRQ> Aug 17 07:33:56 pro5s1 kernel: [ 1710.925338] dump_stack+0x50/0x70 Aug 17 07:33:56 pro5s1 kernel: [ 1710.925975] nmi_cpu_backtrace.cold+0x14/0x53 Aug 17 07:33:56 pro5s1 kernel: [ 1710.926592] ? lapic_can_unplug_cpu.cold+0x39/0x39 Aug 17 07:33:56 pro5s1 kernel: [ 1710.927200] nmi_trigger_cpumask_backtrace+0xc5/0xc7 Aug 17 07:33:56 pro5s1 kernel: [ 1710.927812] rcu_dump_cpu_stacks+0x92/0xc0 Aug 17 07:33:56 pro5s1 kernel: [ 1710.928404] rcu_sched_clock_irq.cold+0x1b5/0x3a9 Aug 17 07:33:56 pro5s1 kernel: [ 1710.929006] ? trigger_load_balance+0x5a/0x210 Aug 17 07:33:56 pro5s1 kernel: [ 1710.929603] update_process_times+0x24/0x60 Aug 17 07:33:56 pro5s1 kernel: [ 1710.930199] tick_sched_handle+0x34/0x50 Aug 17 07:33:56 pro5s1 kernel: [ 1710.930798] tick_sched_timer+0x38/0x80 Aug 17 07:33:56 pro5s1 kernel: [ 1710.931382] ? tick_sched_do_timer+0x60/0x60 Aug 17 07:33:56 pro5s1 kernel: [ 1710.931956] __hrtimer_run_queues+0xf6/0x270 Aug 17 07:33:56 pro5s1 kernel: [ 1710.932511] hrtimer_interrupt+0x10e/0x240 Aug 17 07:33:56 pro5s1 kernel: [ 1710.933044] smp_apic_timer_interrupt+0x6c/0x130 Aug 17 07:33:56 pro5s1 kernel: [ 1710.933560] apic_timer_interrupt+0xf/0x20 Aug 17 07:33:56 pro5s1 kernel: [ 1710.934067] </IRQ> Aug 17 07:33:56 pro5s1 kernel: [ 1710.934560] RIP: 0010:queued_spin_lock_slowpath+0x5b/0x1d0 Aug 17 07:33:56 pro5s1 kernel: [ 1710.935061] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00 Aug 17 07:33:56 pro5s1 kernel: [ 1710.936097] RSP: 0018:ffffb40640e53db0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 Aug 17 07:33:56 pro5s1 kernel: [ 1710.936610] RAX: 0000000000000101 RBX: ffffa0bbf36e0450 RCX: 61c8864680b583eb Aug 17 07:33:56 pro5s1 kernel: [ 1710.937127] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa0bc0abc67d8 Aug 17 07:33:56 pro5s1 kernel: [ 1710.937630] RBP: ffffa0bc00ad3410 R08: ffffa0bbfa4dd068 R09: ffffb40640e53d10 Aug 17 07:33:56 pro5s1 kernel: [ 1710.938131] R10: 00000180607594dc R11: 0000000004a4cd10 R12: ffffa0bc0b8e4e40 Aug 17 07:33:56 pro5s1 kernel: [ 1710.938638] R13: ffffa0bc00a81338 R14: ffffa0bc0abc6000 R15: ffffa0bc0abc67d8 Aug 17 07:33:56 pro5s1 kernel: [ 1710.939152] gfs2_trans_add_meta+0x80/0x1f0 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.939662] update_statfs+0x40/0x110 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.940168] gfs2_statfs_sync+0x1b3/0x1f0 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.940666] ? gfs2_statfs_sync+0x6c/0x1f0 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.941162] gfs2_quotad+0x1c3/0x250 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.941652] ? wait_woken+0x70/0x70 Aug 17 07:33:56 pro5s1 kernel: [ 1710.942150] kthread+0xf9/0x130 Aug 17 07:33:56 pro5s1 kernel: [ 1710.942639] ? gfs2_wake_up_statfs+0x40/0x40 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.943122] ? kthread_park+0x90/0x90 Aug 17 07:33:56 pro5s1 kernel: [ 1710.943601] ret_from_fork+0x1f/0x40 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944562] NMI backtrace for cpu 6 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944562] CPU: 6 PID: 1187 Comm: gfs2_logd Tainted: G D O 5.4.58 #1 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944563] Hardware name: Quanta Cloud Technology Inc. QuantaMicro X10E-9N/S3E-MB, BIOS S3E_3B09.02 02/23/2018 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944563] RIP: 0010:queued_spin_lock_slowpath+0x5b/0x1d0 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944563] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944564] RSP: 0018:ffffb40640e43e50 EFLAGS: 00000202 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944564] RAX: 0000000000000101 RBX: ffffa0bc0abc6000 RCX: 0000000000000000 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944564] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa0bc0abc68ec Aug 17 07:33:56 pro5s1 kernel: [ 1710.944565] RBP: ffffa0bc0abc6050 R08: 00000000000000a0 R09: 7fffffffffffffff Aug 17 07:33:56 pro5s1 kernel: [ 1710.944565] R10: 000001806066529c R11: 00000000224a1f99 R12: 0000000000000c01 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944565] R13: ffffa0bc0abc6848 R14: ffffa0bc0abc6000 R15: ffffa0bc085f2ac0 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944565] FS: 0000000000000000(0000) GS:ffffa0bc0fb80000(0000) knlGS:0000000000000000 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944566] CR2: 0000000000c65ce8 CR3: 0000000672a0a004 CR4: 00000000003606e0 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944566] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944566] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567] Call Trace: Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567] gfs2_ail1_empty+0x22/0x210 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567] ? __next_timer_interrupt+0xd0/0xd0 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567] gfs2_logd+0xa1/0x2e0 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567] ? wait_woken+0x70/0x70 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944568] kthread+0xf9/0x130 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944568] ? gfs2_log_flush+0x640/0x640 [gfs2] Aug 17 07:33:56 pro5s1 kernel: [ 1710.944568] ? kthread_park+0x90/0x90 Aug 17 07:33:56 pro5s1 kernel: [ 1710.944568] ret_from_fork+0x1f/0x40 it then happens again and again every 3 minute ==> /var/log/messages <== Aug 17 07:36:56 pro5s1 kernel: [ 1890.945806] Sending NMI from CPU 2 to CPUs 6: -- Pierre-Philipp -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster