Hello #linux-cluster

David's guide was quite helpful to attempt to run GFS2 with a minimal amount of 
cluster setup, and on top of a DRBD mirror 
(http://people.redhat.com/teigland/cluster4-gfs2-dlm.txt).

Instead of using RHEL, I built all the required pieces from scratch on 
Slackware Linux current (Aug 2020): drbd v9, drbd-utils, libqb, kronosnet, 
corosync and userland dlm (https://pagure.io/dlm).

DRBD node names are `pro5s1` and `pro5s2` and Corosync/DLM/GFS2 cluster name is 
`pro5s`.  It seems I found a bug.  Your opinion, and advice on how to deal with 
it would be much appreciated.  Thank you.  Please just don't tell me to switch 
to RHEL or CentOS.  I suppose those fine pieces of source are supposed to work 
anywhere.  Besides, I hope this kind of feedback may be interesting to you.

Suspected issues: either the fact OCFS2 is also mentioned in dmesg (?!), or 
some network-related discrepancy as the cluster network goes through some VLAN 
I do not have control of.

uname -a
pro5s1: Linux pro5s1 5.4.58 #1 SMP Tue Aug 11 14:42:19 CDT 2020 x86_64 Intel(R) 
Xeon(R) CPU E3-1240 v6 @ 3.70GHz GenuineIntel GNU/Linux
pro5s2: Linux pro5s2 5.4.58 #1 SMP Tue Aug 11 14:42:19 CDT 2020 x86_64 Intel(R) 
Xeon(R) CPU E3-1240 v6 @ 3.70GHz GenuineIntel GNU/Linux

pro5s1: BOOT_IMAGE=/boot/vmlinuz root=/dev/sda1 ro mitigations=off vga=791
pro5s2: BOOT_IMAGE=/boot/vmlinuz root=/dev/sda1 ro mitigations=off vga=791

dmesg | grep -E 'drbd.*Version'"
pro5s1: [   11.735205] drbd: initialized. Version: 9.0.24-1 (api:2/proto:86-117)
pro5s2: [   11.753043] drbd: initialized. Version: 9.0.24-1 (api:2/proto:86-117)

drbdadm -V
pro5s1: DRBDADM_BUILDTAG=GIT-hash:\ a513dea1cf000164fd87e56525d098a426131a86\ 
build\ by\ root@pro5s1\,\ 2020-08-17\ 06:59:43
pro5s1: DRBDADM_API_VERSION=2
pro5s1: DRBD_KERNEL_VERSION_CODE=0x090018
pro5s1: DRBD_KERNEL_VERSION=9.0.24
pro5s1: DRBDADM_VERSION_CODE=0x090d00
pro5s1: DRBDADM_VERSION=9.13.0
pro5s2: DRBDADM_BUILDTAG=GIT-hash:\ a513dea1cf000164fd87e56525d098a426131a86\ 
build\ by\ root@pro5s2\,\ 2020-08-17\ 07:00:55
pro5s2: DRBDADM_API_VERSION=2
pro5s2: DRBD_KERNEL_VERSION_CODE=0x090018
pro5s2: DRBD_KERNEL_VERSION=9.0.24
pro5s2: DRBDADM_VERSION_CODE=0x090d00
pro5s2: DRBDADM_VERSION=9.13.0

cat /proc/cmdline
zcat /proc/config.gz | grep -E 'GFS2|DLM'
pro5s1: CONFIG_GFS2_FS=m
pro5s1: CONFIG_GFS2_FS_LOCKING_DLM=y
pro5s1: CONFIG_DLM=m
pro5s1: # CONFIG_DLM_DEBUG is not set
pro5s2: CONFIG_GFS2_FS=m
pro5s2: CONFIG_GFS2_FS_LOCKING_DLM=y
pro5s2: CONFIG_DLM=m
pro5s2: # CONFIG_DLM_DEBUG is not set

dmesg | grep -Ei 'gfs2|dlm'
pro5s1: [    0.449985] OCFS2 User DLM kernel interface loaded
pro5s1: [   11.080928] DLM installed
pro5s1: [   11.087535] gfs2: GFS2 installed
pro5s2: [    0.447813] OCFS2 User DLM kernel interface loaded
pro5s2: [   57.100751] DLM installed
pro5s2: [   57.107343] gfs2: GFS2 installed

---> I am not sure why OCFS2 is mentioned here.  There are a few modules 
available indeed,

/lib/modules/5.4.58/kernel/fs/dlm
/lib/modules/5.4.58/kernel/fs/dlm/dlm.ko
/lib/modules/5.4.58/kernel/fs/ocfs2/dlm
/lib/modules/5.4.58/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko
/lib/modules/5.4.58/kernel/fs/ocfs2/dlmfs
/lib/modules/5.4.58/kernel/fs/ocfs2/dlmfs/ocfs2_dlmfs.ko

but as far as I know it's fs/dlm/dlm.ko that gets loaded, not ocfs2_dlm.ko.

corosync -v
pro5s1: Corosync Cluster Engine, version '3.0.4'
pro5s1: Copyright (c) 2006-2018 Red Hat, Inc.
pro5s2: Corosync Cluster Engine, version '3.0.4'
pro5s2: Copyright (c) 2006-2018 Red Hat, Inc.

for happy-happy testing:

cat /etc/dlm/dlm.conf
pro5s1: enable_fencing=0
pro5s2: enable_fencing=0

lsmod | grep -E 'dlm|gfs2|drbd'
pro5s1: drbd_transport_tcp     28672  3
pro5s1: drbd                  638976  4 drbd_transport_tcp
pro5s1: gfs2                  503808  0
pro5s1: dlm                   196608  9 gfs2
pro5s2: gfs2                  503808  0
pro5s2: dlm                   196608  9 gfs2
pro5s2: drbd_transport_tcp     24576  3
pro5s2: drbd                  544768  4 drbd_transport_tcp

no need to have the second node as primary yet (dual primaries are enabled, so 
it could be used if needed, but we're not that far yet):

drbdadm status res-data
pro5s1: res-data role:Primary
pro5s1:   disk:UpToDate
pro5s1:   pro5s2 role:Secondary
pro5s1:     peer-disk:UpToDate
pro5s1:
pro5s2: res-data role:Secondary
pro5s2:   disk:UpToDate
pro5s2:   pro5s1 role:Primary
pro5s2:     peer-disk:UpToDate
pro5s2:

pgrep -a dlm
pro5s1: 172 user_dlm
pro5s1: 1038 /usr/sbin/dlm_controld
pro5s1: 1039 /usr/sbin/dlm_controld
pro5s2: 172 user_dlm
pro5s2: 1033 /usr/sbin/dlm_controld
pro5s2: 1042 /usr/sbin/dlm_controld

corosync-quorumtool
Quorum information
------------------
Date:             Mon Aug 17 07:27:25 2020
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          1
Ring ID:          1.109
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pro5s1 (local)
         2          1 pro5s2

Now mounting GFS2

pro5s1# mount -t gfs2 /dev/drbd0 /data
pro5s1# ll /data
total 8.0K
drwxr-xr-x  2 root root 3.8K Aug 11 20:27 ./
drwxr-xr-x 24 root root 4.0K Aug 17 06:44 ../

logs are fine so far:

==> /var/log/messages <==
Aug 17 07:31:24 pro5s1 kernel: [ 1559.513131] gfs2: fsid=pro5s:data: Trying to 
join cluster "lock_dlm", "pro5s:data"
Aug 17 07:31:24 pro5s1 kernel: [ 1559.513535] dlm: data: joining the lockspace 
group...
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516905] dlm: data: dlm_recover 1
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516924] dlm: data: add member 1
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516925] dlm: data: dlm_recover_members 1 
nodes
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516926] dlm: data: generation 1 slots 1 
1:1
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516926] dlm: data: dlm_recover_directory
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516927] dlm: data: dlm_recover_directory 
0 in 0 new
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516928] dlm: data: dlm_recover_directory 
0 out 0 messages
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516930] dlm: data: group event done 0 0
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516931] dlm: data: join complete
Aug 17 07:31:24 pro5s1 kernel: [ 1559.516937] dlm: data: dlm_recover 1 
generation 1 done: 0 ms
Aug 17 07:31:25 pro5s1 kernel: [ 1560.018339] gfs2: fsid=pro5s:data: first 
mounter control generation 0
Aug 17 07:31:25 pro5s1 kernel: [ 1560.018340] gfs2: fsid=pro5s:data: Joined 
cluster. Now mounting FS...
Aug 17 07:31:25 pro5s1 kernel: [ 1560.024737] gfs2: fsid=pro5s:data.0: journal 
0 mapped with 1 extents in 0ms
Aug 17 07:31:25 pro5s1 kernel: [ 1560.024741] gfs2: fsid=pro5s:data.0: jid=0, 
already locked for use
Aug 17 07:31:25 pro5s1 kernel: [ 1560.024741] gfs2: fsid=pro5s:data.0: jid=0: 
Looking at journal...
Aug 17 07:31:25 pro5s1 kernel: [ 1560.064018] gfs2: fsid=pro5s:data.0: jid=0: 
Journal head lookup took 39ms
Aug 17 07:31:25 pro5s1 kernel: [ 1560.064077] gfs2: fsid=pro5s:data.0: jid=0: 
Done
Aug 17 07:31:25 pro5s1 kernel: [ 1560.064119] gfs2: fsid=pro5s:data.0: jid=1: 
Trying to acquire journal lock...
Aug 17 07:31:25 pro5s1 kernel: [ 1560.064343] gfs2: fsid=pro5s:data.0: jid=1: 
Looking at journal...
Aug 17 07:31:25 pro5s1 kernel: [ 1560.066188] gfs2: fsid=pro5s:data.0: journal 
1 mapped with 1 extents in 0ms
Aug 17 07:31:25 pro5s1 kernel: [ 1560.114308] gfs2: fsid=pro5s:data.0: jid=1: 
Journal head lookup took 49ms
Aug 17 07:31:25 pro5s1 kernel: [ 1560.114338] gfs2: fsid=pro5s:data.0: jid=1: 
Done
Aug 17 07:31:25 pro5s1 kernel: [ 1560.114352] gfs2: fsid=pro5s:data.0: first 
mount done, others may mount

==> /var/log/syslog <==
Aug 17 07:31:24 pro5s1 kernel: [ 1559.513371] dlm: Using TCP for communications

trying to write something:

pro5s1# echo ok > /data/file-check

Oops, here comes the bug

==> /var/log/messages <==
Aug 17 07:32:30 pro5s1 kernel: [ 1625.215500] PGD 8435be067 P4D 8435be067 PUD 
844829067 PMD 0

==> /var/log/syslog <==
Aug 17 07:32:30 pro5s1 kernel: [ 1625.214399] BUG: kernel NULL pointer 
dereference, address: 0000000000000008
Aug 17 07:32:30 pro5s1 kernel: [ 1625.214774] #PF: supervisor write access in 
kernel mode
Aug 17 07:32:30 pro5s1 kernel: [ 1625.215140] #PF: error_code(0x0002) - 
not-present page
Aug 17 07:32:30 pro5s1 kernel: [ 1625.215860] Oops: 0002 [#1] SMP NOPTI
Aug 17 07:32:30 pro5s1 kernel: [ 1625.216218] CPU: 0 PID: 1060 Comm: bash 
Tainted: G           O      5.4.58 #1
Aug 17 07:32:30 pro5s1 kernel: [ 1625.216583] Hardware name: Quanta Cloud 
Technology Inc. QuantaMicro X10E-9N/S3E-MB, BIOS S3E_3B09.02 02/23/2018
Aug 17 07:32:30 pro5s1 kernel: [ 1625.216974] RIP: 
0010:gfs2_log_commit+0xf4/0x3f0 [gfs2]
Aug 17 07:32:30 pro5s1 kernel: [ 1625.217358] Code: 48 89 45 60 48 8d bb ec 08 
00 00 e8 16 12 94 dd 48 8b 55 70 48 8d 45 70 48 39 d0 74 29 49 8b 4c 24 78 48 
8b 75 70 48 8b 55 78 <48> 89 4e 08 48 89 31 49 8d 4c 24 70 48 89 0a 49 89 54 24 
78 48 89
Aug 17 07:32:30 pro5s1 kernel: [ 1625.218155] RSP: 0018:ffffb406403f7b38 
EFLAGS: 00010282
Aug 17 07:32:30 pro5s1 kernel: [ 1625.218549] RAX: ffffa0bc0341bbb0 RBX: 
ffffa0bc0abc6000 RCX: 0000000000000000
Aug 17 07:32:30 pro5s1 kernel: [ 1625.218947] RDX: 0000000000000000 RSI: 
0000000000000000 RDI: ffffa0bc0abc68ec
Aug 17 07:32:30 pro5s1 kernel: [ 1625.219346] RBP: ffffa0bc0341bb40 R08: 
0000000000000001 R09: ffffa0bc00b235d0
Aug 17 07:32:30 pro5s1 kernel: [ 1625.219742] R10: 78445afabba0ffff R11: 
0000000000000001 R12: ffffa0bc0341b840
Aug 17 07:32:30 pro5s1 kernel: [ 1625.220136] R13: ffffa0bc0abc6878 R14: 
ffffa0bc00adb9a8 R15: ffffa0bc0abc6000
Aug 17 07:32:30 pro5s1 kernel: [ 1625.220531] FS:  00007f46af6ad740(0000) 
GS:ffffa0bc0fa00000(0000) knlGS:0000000000000000
Aug 17 07:32:30 pro5s1 kernel: [ 1625.220963] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Aug 17 07:32:30 pro5s1 kernel: [ 1625.221361] CR2: 0000000000000008 CR3: 
00000008448aa002 CR4: 00000000003606f0
Aug 17 07:32:30 pro5s1 kernel: [ 1625.221774] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Aug 17 07:32:30 pro5s1 kernel: [ 1625.222168] DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400
Aug 17 07:32:30 pro5s1 kernel: [ 1625.222551] Call Trace:
Aug 17 07:32:30 pro5s1 kernel: [ 1625.222931]  gfs2_trans_end+0x7d/0x160 [gfs2]
Aug 17 07:32:30 pro5s1 kernel: [ 1625.223305]  gfs2_create_inode+0xb5e/0x1330 
[gfs2]
Aug 17 07:32:30 pro5s1 kernel: [ 1625.223670]  ? gfs2_create_inode+0x103/0x1330 
[gfs2]
Aug 17 07:32:30 pro5s1 kernel: [ 1625.224028]  ? gfs2_create_inode+0x9dc/0x1330 
[gfs2]
Aug 17 07:32:30 pro5s1 kernel: [ 1625.224383]  gfs2_atomic_open+0x56/0xe0 [gfs2]
Aug 17 07:32:30 pro5s1 kernel: [ 1625.224737]  path_openat+0x8ea/0x1570
Aug 17 07:32:30 pro5s1 kernel: [ 1625.225106]  do_filp_open+0x91/0x100
Aug 17 07:32:30 pro5s1 kernel: [ 1625.225436]  ? __check_object_size+0x136/0x147
Aug 17 07:32:30 pro5s1 kernel: [ 1625.225764]  do_sys_open+0x184/0x220
Aug 17 07:32:30 pro5s1 kernel: [ 1625.226094]  do_syscall_64+0x4c/0x170
Aug 17 07:32:30 pro5s1 kernel: [ 1625.226426]  
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 17 07:32:30 pro5s1 kernel: [ 1625.226767] RIP: 0033:0x7f46af7b65d7
Aug 17 07:32:30 pro5s1 kernel: [ 1625.227105] Code: 25 00 00 41 00 3d 00 00 41 
00 74 37 64 8b 04 25 18 00 00 00 85 c0 75 5b 44 89 e2 48 89 ee bf 9c ff ff ff 
b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 85 00 00 00 48 83 c4 68 5d 41 5c 
c3 0f 1f
Aug 17 07:32:30 pro5s1 kernel: [ 1625.227812] RSP: 002b:00007fff93fb0220 
EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Aug 17 07:32:30 pro5s1 kernel: [ 1625.228170] RAX: ffffffffffffffda RBX: 
0000000001889f08 RCX: 00007f46af7b65d7
Aug 17 07:32:30 pro5s1 kernel: [ 1625.228536] RDX: 0000000000000241 RSI: 
000000000181ad48 RDI: 00000000ffffff9c
Aug 17 07:32:30 pro5s1 kernel: [ 1625.228957] RBP: 000000000181ad48 R08: 
0000000000000000 R09: 0000000000000020
Aug 17 07:32:30 pro5s1 kernel: [ 1625.229319] R10: 00000000000001b6 R11: 
0000000000000246 R12: 0000000000000241
Aug 17 07:32:30 pro5s1 kernel: [ 1625.229683] R13: 0000000000000000 R14: 
0000000000000001 R15: 000000000181ad48
Aug 17 07:32:30 pro5s1 kernel: [ 1625.230048] Modules linked in: 
drbd_transport_tcp(O) drbd(O) bridge 8021q garp mrp stp llc ipv6 nf_defrag_ipv6 
gfs2 dlm coretemp intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal 
intel_powerclamp kvm_intel joydev kvm hid_generic irqbypass ast usbhid 
drm_vram_helper ttm crct10dif_pclmul crc32_pclmul hid drm_kms_helper 
ghash_clmulni_intel evdev drm rapl agpgart ipmi_ssif fb_sys_fops dm_thin_pool 
igb intel_cstate syscopyarea dm_persistent_data dca sysfillrect i2c_i801 
i2c_algo_bit dm_bio_prison sysimgblt dm_bufio i2c_core ipmi_si ipmi_devintf 
xhci_pci mei_me ipmi_msghandler acpi_power_meter pinctrl_sunrisepoint 
intel_pch_thermal xhci_hcd pinctrl_intel mei hwmon video acpi_pad thermal fan 
button loop
Aug 17 07:32:30 pro5s1 kernel: [ 1625.232548] CR2: 0000000000000008
Aug 17 07:32:30 pro5s1 kernel: [ 1625.233834] ---[ end trace ea2a9d0e210031cf 
]---
Aug 17 07:32:30 pro5s1 kernel: [ 1625.236334] RIP: 
0010:gfs2_log_commit+0xf4/0x3f0 [gfs2]
Aug 17 07:32:30 pro5s1 kernel: [ 1625.236808] Code: 48 89 45 60 48 8d bb ec 08 
00 00 e8 16 12 94 dd 48 8b 55 70 48 8d 45 70 48 39 d0 74 29 49 8b 4c 24 78 48 
8b 75 70 48 8b 55 78 <48> 89 4e 08 48 89 31 49 8d 4c 24 70 48 89 0a 49 89 54 24 
78 48 89
Aug 17 07:32:30 pro5s1 kernel: [ 1625.237715] RSP: 0018:ffffb406403f7b38 
EFLAGS: 00010282
Aug 17 07:32:30 pro5s1 kernel: [ 1625.238156] RAX: ffffa0bc0341bbb0 RBX: 
ffffa0bc0abc6000 RCX: 0000000000000000
Aug 17 07:32:30 pro5s1 kernel: [ 1625.238597] RDX: 0000000000000000 RSI: 
0000000000000000 RDI: ffffa0bc0abc68ec
Aug 17 07:32:30 pro5s1 kernel: [ 1625.239032] RBP: ffffa0bc0341bb40 R08: 
0000000000000001 R09: ffffa0bc00b235d0
Aug 17 07:32:30 pro5s1 kernel: [ 1625.239455] R10: 78445afabba0ffff R11: 
0000000000000001 R12: ffffa0bc0341b840
Aug 17 07:32:30 pro5s1 kernel: [ 1625.239872] R13: ffffa0bc0abc6878 R14: 
ffffa0bc00adb9a8 R15: ffffa0bc0abc6000
Aug 17 07:32:30 pro5s1 kernel: [ 1625.240287] FS:  00007f46af6ad740(0000) 
GS:ffffa0bc0fa00000(0000) knlGS:0000000000000000
Aug 17 07:32:30 pro5s1 kernel: [ 1625.240706] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Aug 17 07:32:30 pro5s1 kernel: [ 1625.241157] CR2: 0000000000000008 CR3: 
00000008448aa002 CR4: 00000000003606f0
Aug 17 07:32:30 pro5s1 kernel: [ 1625.241584] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Aug 17 07:32:30 pro5s1 kernel: [ 1625.242007] DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400

==> /var/log/wtmp <==
"pts/0:_r�

after that there either one or two cores (out of 8) at 100% while corosync is 
complaining:

==> /var/log/messages <==
Aug 17 07:32:56 pro5s1 corosync[1022]:   [KNET  ] link: host: 2 link: 0 is down
Aug 17 07:32:56 pro5s1 corosync[1022]:   [KNET  ] host: host: 2 (passive) best 
link: 0 (pri: 1)

==> /var/log/syslog <==
Aug 17 07:32:56 pro5s1 corosync[1022]:   [KNET  ] host: host: 2 has no active 
links

==> /var/log/messages <==
Aug 17 07:32:56 pro5s1 last message buffered 1 times
Aug 17 07:32:56 pro5s1 corosync[1022]:   [TOTEM ] Token has not been received 
in 750 ms
Aug 17 07:32:56 pro5s1 corosync[1022]:   [TOTEM ] A processor failed, forming 
new configuration.
Aug 17 07:32:58 pro5s1 corosync[1022]:   [TOTEM ] A new membership (1.10d) was 
formed. Members left: 2
Aug 17 07:32:58 pro5s1 corosync[1022]:   [TOTEM ] Failed to receive the leave 
message. failed: 2
Aug 17 07:32:58 pro5s1 corosync[1022]:   [QUORUM] Members[1]: 1
Aug 17 07:32:58 pro5s1 corosync[1022]:   [MAIN  ] Completed service 
synchronization, ready to provide service.

==> /var/log/syslog <==
Aug 17 07:32:56 pro5s1 last message buffered 1 times
Aug 17 07:32:58 pro5s1 kernel: [ 1653.007070] dlm: closing connection to node 2

==> /var/log/messages <==
Aug 17 07:33:53 pro5s1 kernel: [ 1708.732408] drbd res-sabotage pro5s2: conn( 
Connected -> NetworkFailure ) peer( Secondary -> Unknown )
Aug 17 07:33:53 pro5s1 kernel: [ 1708.732845] drbd res-sabotage/0 drbd1 pro5s2: 
pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Aug 17 07:33:53 pro5s1 kernel: [ 1708.733297] drbd res-sabotage pro5s2: 
ack_receiver terminated
Aug 17 07:33:53 pro5s1 kernel: [ 1708.733753] drbd res-sabotage pro5s2: 
Terminating ack_recv thread
Aug 17 07:33:53 pro5s1 kernel: [ 1708.750928] drbd res-sabotage pro5s2: sock 
was shut down by peer

==> /var/log/syslog <==
Aug 17 07:33:53 pro5s1 kernel: [ 1708.731959] drbd res-sabotage pro5s2: PingAck 
did not arrive in time.

and there is another round of output:

==> /var/log/messages <==
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944069] Sending NMI from CPU 2 to CPUs 6:

==> /var/log/syslog <==
Aug 17 07:33:56 pro5s1 kernel: [ 1710.919936] rcu: INFO: rcu_sched 
self-detected stall on CPU
Aug 17 07:33:56 pro5s1 kernel: [ 1710.920537] rcu:      2-....: (59999 ticks 
this GP) idle=8fe/1/0x4000000000000002 softirq=9205/9205 fqs=15000 
last_accelerate: 9bb3/8684, Nonlazy posted: .LD
Aug 17 07:33:56 pro5s1 kernel: [ 1710.921688]   (t=60001 jiffies g=443285 
q=4488)
Aug 17 07:33:56 pro5s1 kernel: [ 1710.922261] NMI backtrace for cpu 2
Aug 17 07:33:56 pro5s1 kernel: [ 1710.922837] CPU: 2 PID: 1188 Comm: 
gfs2_quotad Tainted: G      D    O      5.4.58 #1
Aug 17 07:33:56 pro5s1 kernel: [ 1710.923432] Hardware name: Quanta Cloud 
Technology Inc. QuantaMicro X10E-9N/S3E-MB, BIOS S3E_3B09.02 02/23/2018
Aug 17 07:33:56 pro5s1 kernel: [ 1710.924067] Call Trace:
Aug 17 07:33:56 pro5s1 kernel: [ 1710.924709]  <IRQ>
Aug 17 07:33:56 pro5s1 kernel: [ 1710.925338]  dump_stack+0x50/0x70
Aug 17 07:33:56 pro5s1 kernel: [ 1710.925975]  nmi_cpu_backtrace.cold+0x14/0x53
Aug 17 07:33:56 pro5s1 kernel: [ 1710.926592]  ? 
lapic_can_unplug_cpu.cold+0x39/0x39
Aug 17 07:33:56 pro5s1 kernel: [ 1710.927200]  
nmi_trigger_cpumask_backtrace+0xc5/0xc7
Aug 17 07:33:56 pro5s1 kernel: [ 1710.927812]  rcu_dump_cpu_stacks+0x92/0xc0
Aug 17 07:33:56 pro5s1 kernel: [ 1710.928404]  
rcu_sched_clock_irq.cold+0x1b5/0x3a9
Aug 17 07:33:56 pro5s1 kernel: [ 1710.929006]  ? trigger_load_balance+0x5a/0x210
Aug 17 07:33:56 pro5s1 kernel: [ 1710.929603]  update_process_times+0x24/0x60
Aug 17 07:33:56 pro5s1 kernel: [ 1710.930199]  tick_sched_handle+0x34/0x50
Aug 17 07:33:56 pro5s1 kernel: [ 1710.930798]  tick_sched_timer+0x38/0x80
Aug 17 07:33:56 pro5s1 kernel: [ 1710.931382]  ? tick_sched_do_timer+0x60/0x60
Aug 17 07:33:56 pro5s1 kernel: [ 1710.931956]  __hrtimer_run_queues+0xf6/0x270
Aug 17 07:33:56 pro5s1 kernel: [ 1710.932511]  hrtimer_interrupt+0x10e/0x240
Aug 17 07:33:56 pro5s1 kernel: [ 1710.933044]  
smp_apic_timer_interrupt+0x6c/0x130
Aug 17 07:33:56 pro5s1 kernel: [ 1710.933560]  apic_timer_interrupt+0xf/0x20
Aug 17 07:33:56 pro5s1 kernel: [ 1710.934067]  </IRQ>
Aug 17 07:33:56 pro5s1 kernel: [ 1710.934560] RIP: 
0010:queued_spin_lock_slowpath+0x5b/0x1d0
Aug 17 07:33:56 pro5s1 kernel: [ 1710.935061] Code: 6d f0 0f ba 2f 08 0f 92 c0 
0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 
07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 
00 01 00
Aug 17 07:33:56 pro5s1 kernel: [ 1710.936097] RSP: 0018:ffffb40640e53db0 
EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Aug 17 07:33:56 pro5s1 kernel: [ 1710.936610] RAX: 0000000000000101 RBX: 
ffffa0bbf36e0450 RCX: 61c8864680b583eb
Aug 17 07:33:56 pro5s1 kernel: [ 1710.937127] RDX: 0000000000000000 RSI: 
0000000000000000 RDI: ffffa0bc0abc67d8
Aug 17 07:33:56 pro5s1 kernel: [ 1710.937630] RBP: ffffa0bc00ad3410 R08: 
ffffa0bbfa4dd068 R09: ffffb40640e53d10
Aug 17 07:33:56 pro5s1 kernel: [ 1710.938131] R10: 00000180607594dc R11: 
0000000004a4cd10 R12: ffffa0bc0b8e4e40
Aug 17 07:33:56 pro5s1 kernel: [ 1710.938638] R13: ffffa0bc00a81338 R14: 
ffffa0bc0abc6000 R15: ffffa0bc0abc67d8
Aug 17 07:33:56 pro5s1 kernel: [ 1710.939152]  gfs2_trans_add_meta+0x80/0x1f0 
[gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.939662]  update_statfs+0x40/0x110 [gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.940168]  gfs2_statfs_sync+0x1b3/0x1f0 
[gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.940666]  ? gfs2_statfs_sync+0x6c/0x1f0 
[gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.941162]  gfs2_quotad+0x1c3/0x250 [gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.941652]  ? wait_woken+0x70/0x70
Aug 17 07:33:56 pro5s1 kernel: [ 1710.942150]  kthread+0xf9/0x130
Aug 17 07:33:56 pro5s1 kernel: [ 1710.942639]  ? gfs2_wake_up_statfs+0x40/0x40 
[gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.943122]  ? kthread_park+0x90/0x90
Aug 17 07:33:56 pro5s1 kernel: [ 1710.943601]  ret_from_fork+0x1f/0x40
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944562] NMI backtrace for cpu 6
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944562] CPU: 6 PID: 1187 Comm: gfs2_logd 
Tainted: G      D    O      5.4.58 #1
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944563] Hardware name: Quanta Cloud 
Technology Inc. QuantaMicro X10E-9N/S3E-MB, BIOS S3E_3B09.02 02/23/2018
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944563] RIP: 
0010:queued_spin_lock_slowpath+0x5b/0x1d0
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944563] Code: 6d f0 0f ba 2f 08 0f 92 c0 
0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 
07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 
00 01 00
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944564] RSP: 0018:ffffb40640e43e50 
EFLAGS: 00000202
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944564] RAX: 0000000000000101 RBX: 
ffffa0bc0abc6000 RCX: 0000000000000000
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944564] RDX: 0000000000000000 RSI: 
0000000000000000 RDI: ffffa0bc0abc68ec
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944565] RBP: ffffa0bc0abc6050 R08: 
00000000000000a0 R09: 7fffffffffffffff
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944565] R10: 000001806066529c R11: 
00000000224a1f99 R12: 0000000000000c01
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944565] R13: ffffa0bc0abc6848 R14: 
ffffa0bc0abc6000 R15: ffffa0bc085f2ac0
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944565] FS:  0000000000000000(0000) 
GS:ffffa0bc0fb80000(0000) knlGS:0000000000000000
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944566] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944566] CR2: 0000000000c65ce8 CR3: 
0000000672a0a004 CR4: 00000000003606e0
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944566] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944566] DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567] Call Trace:
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567]  gfs2_ail1_empty+0x22/0x210 [gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567]  ? 
__next_timer_interrupt+0xd0/0xd0
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567]  gfs2_logd+0xa1/0x2e0 [gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944567]  ? wait_woken+0x70/0x70
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944568]  kthread+0xf9/0x130
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944568]  ? gfs2_log_flush+0x640/0x640 
[gfs2]
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944568]  ? kthread_park+0x90/0x90
Aug 17 07:33:56 pro5s1 kernel: [ 1710.944568]  ret_from_fork+0x1f/0x40

it then happens again and again every 3 minute

==> /var/log/messages <==
Aug 17 07:36:56 pro5s1 kernel: [ 1890.945806] Sending NMI from CPU 2 to CPUs 6:

-- 
Pierre-Philipp


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to