Hi,
I encountered gluster hung after 10 hours file transfer test.
gluster3.7.14 nfs-ganesha 2.3.2
we are running on 56-cores superMicro PC.
>sudo system-docker stats gluster nfs
CONTAINER CPU % MEM USAGE / LIMIT MEM %
NET I/O BLOCK I/O
gluster 2694.74% 2.434 GB / 270.4 GB 0.90%
0 B / 0 B 0 B / 1.073 MB
nfs 30.07% 146.6 MB / 270.4 GB 0.05%
0 B / 0 B 4.096 kB / 0 B
>top capture:
root S 2556m 0% 24% /usr/local/sbin/glusterfsd -s denali-bm-qa-45
--volfile-id gluster-volume
gdb attach to some glusterfsd thread. it reported:
#0 pthread_spin_lock () at ../sysdeps/x86_64/nptl/pthread_spin_lock.S:32
#1 0x00007f945f379ae5 in pl_inode_get (this=this@entry=0x7f9460010720,
inode=inode@entry=0x7f943ffe1edc) at common.c:416
#2 0x00007f945f3883be in pl_common_inodelk (frame=0x7f9467dc2ed8,
this=0x7f9460010720, volume=0x7f945b5a9ac0 "gluster-volume-disperse-0",
inode=0x7f943ffe1edc, cmd=6, flock=0x7f94678653d8, loc=0x7f94678652d8, fd=0x0,
xdata=0x7f946a2e9180) at inodelk.c:743
#3 0x00007f945f388e27 in pl_inodelk (frame=<optimized out>, this=<optimized
out>, volume=<optimized out>, loc=<optimized out>, cmd=<optimized out>,
flock=<optimized out>, xdata=0x7f946a2e9180) at inodelk.c:816
#4 0x00007f946a00b5c6 in default_inodelk (frame=0x7f9467dc2ed8,
this=0x7f9460011bf0, volume=0x7f945b5a9ac0 "gluster-volume-disperse-0",
loc=0x7f94678652d8, cmd=6, lock=0x7f94678653d8, xdata=0x7f946a2e9180) at
defaults.c:2032
#5 0x00007f946a01e324 in default_inodelk_resume (frame=0x7f9467dbabd4,
this=0x7f9460013070, volume=0x7f945b5a9ac0 "gluster-volume-disperse-0",
loc=0x7f94678652d8, cmd=6, lock=0x7f94678653d8, xdata=0x7f946a2e9180) at
defaults.c:1589
#6 0x00007f946a03c1ce in call_resume_wind (stub=<optimized out>) at
call-stub.c:2210
#7 0x00007f946a03c5bd in call_resume (stub=0x7f9467865298) at call-stub.c:2576
#8 0x00007f945ef5b2b2 in iot_worker (data=0x7f9460052ec0) at io-threads.c:215
#9 0x00007f946979270a in start_thread (arg=0x7f943cd5e700) at
pthread_create.c:333
#10 0x00007f94694c882d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
It shows that many glusterfsd sub-threads' pthread_spin_lock wait for unlock.
it caused CPU load so high.
|-glusterfsd(772)-+-{glusterfsd}(773)
| |-{glusterfsd}(774)
| |-{glusterfsd}(775)
| |-{glusterfsd}(776)
| |-{glusterfsd}(777)
| |-{glusterfsd}(778)
| |-{glusterfsd}(779)
| |-{glusterfsd}(780)
| |-{glusterfsd}(781)
| |-{glusterfsd}(782)
| |-{glusterfsd}(783)
| |-{glusterfsd}(784)
| |-{glusterfsd}(785)
| |-{glusterfsd}(786)
| |-{glusterfsd}(787)
| |-{glusterfsd}(788)
| `-{glusterfsd}(789)
|-glusterfsd(791)-+-{glusterfsd}(792)
| |-{glusterfsd}(793)
| |-{glusterfsd}(794)
| |-{glusterfsd}(795)
| |-{glusterfsd}(796)
| |-{glusterfsd}(797)
| |-{glusterfsd}(798)
| |-{glusterfsd}(799)
| |-{glusterfsd}(800)
| |-{glusterfsd}(801)
| |-{glusterfsd}(802)
| |-{glusterfsd}(803)
| |-{glusterfsd}(804)
| |-{glusterfsd}(805)
| |-{glusterfsd}(806)
| |-{glusterfsd}(807)
| `-{glusterfsd}(808)
If just wait for few hours, the system will recover to normal.
I am wondering how to go deeply to discover what caused one of the thread hold
the lock so long. Please give me your professional advice.
Best Regards!
James Zhu
Email Disclaimer & Confidentiality Notice
This message is confidential and intended solely for the use of the recipient
to whom they are addressed. If you are not the intended recipient you should
not deliver, distribute or copy this e-mail. Please notify the sender
immediately by e-mail and delete this e-mail from your system. Copyright © 2016
by Istuary Innovation Labs, Inc. All rights reserved.
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users