+Poornima who works on parallel-readdir. @Poornima, Have you seen anything like this before?
On 14 June 2018 at 10:07, Nithya Balachandran <[email protected]> wrote: > This is not the same issue as the one you are referring - that was in the > RPC layer and caused the bricks to crash. This one is different as it seems > to be in the dht and rda layers. It does look like a stack overflow though. > > @Mohammad, > > Please send the following information: > > 1. gluster volume info > 2. The number of entries in the directory being listed > 3. System memory > > Does this still happen if you turn off parallel-readdir? > > Regards, > Nithya > > > > > On 13 June 2018 at 16:40, Milind Changire <[email protected]> wrote: > >> +Nithya >> >> Nithya, >> Do these logs [1] look similar to the recursive readdir() issue that you >> encountered just a while back ? >> i.e. recursive readdir() response definition in the XDR >> >> [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log >> >> >> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif <[email protected]> >> wrote: >> >>> Hi Milind >>> >>> Thanks a lot, I manage to run gdb and produced traceback as well. Its >>> here >>> >>> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log >>> >>> >>> I am trying to understand but still not able to make sense out of it. >>> >>> Thanks >>> >>> Kashif >>> >>> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire <[email protected]> >>> wrote: >>> >>>> Kashif, >>>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/ >>>> >>>> >>>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif <[email protected] >>>> > wrote: >>>> >>>>> Hi Milind >>>>> >>>>> There is no glusterfs-debuginfo available for gluster-3.12 from >>>>> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. >>>>> Do you know from where I can get it? >>>>> Also when I run gdb, it says >>>>> >>>>> Missing separate debuginfos, use: debuginfo-install >>>>> glusterfs-fuse-3.12.9-1.el6.x86_64 >>>>> >>>>> I can't find debug package for glusterfs-fuse either >>>>> >>>>> Thanks from the pit of despair ;) >>>>> >>>>> Kashif >>>>> >>>>> >>>>> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Milind >>>>>> >>>>>> I will send you links for logs. >>>>>> >>>>>> I collected these core dumps at client and there is no glusterd >>>>>> process running on client. >>>>>> >>>>>> Kashif >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> Kashif, >>>>>>> Could you also send over the client/mount log file as Vijay >>>>>>> suggested ? >>>>>>> Or maybe the lines with the crash backtrace lines >>>>>>> >>>>>>> Also, you've mentioned that you straced glusterd, but when you ran >>>>>>> gdb, you ran it over /usr/sbin/glusterfs >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Milind >>>>>>>>> >>>>>>>>> The operating system is Scientific Linux 6 which is based on >>>>>>>>> RHEL6. The cpu arch is Intel x86_64. >>>>>>>>> >>>>>>>>> I will send you a separate email with link to core dump. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> You could also grep for crash in the client log file and the lines >>>>>>>> following crash would have a backtrace in most cases. >>>>>>>> >>>>>>>> HTH, >>>>>>>> Vijay >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks for your help. >>>>>>>>> >>>>>>>>> Kashif >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Kashif, >>>>>>>>>> Could you share the core dump via Google Drive or something >>>>>>>>>> similar >>>>>>>>>> >>>>>>>>>> Also, let me know the CPU arch and OS Distribution on which you >>>>>>>>>> are running gluster. >>>>>>>>>> >>>>>>>>>> If you've installed the glusterfs-debuginfo package, you'll also >>>>>>>>>> get the source lines in the backtrace via gdb >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Milind, Vijay >>>>>>>>>>> >>>>>>>>>>> Thanks, I have some more information now as I straced glusterd >>>>>>>>>>> on client >>>>>>>>>>> >>>>>>>>>>> 138544 0.000131 mprotect(0x7f2f70785000, 4096, >>>>>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000026> >>>>>>>>>>> 138544 0.000128 mprotect(0x7f2f70786000, 4096, >>>>>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000027> >>>>>>>>>>> 138544 0.000126 mprotect(0x7f2f70787000, 4096, >>>>>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000027> >>>>>>>>>>> 138544 0.000124 --- SIGSEGV {si_signo=SIGSEGV, >>>>>>>>>>> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} --- >>>>>>>>>>> 138544 0.000051 --- SIGSEGV {si_signo=SIGSEGV, >>>>>>>>>>> si_code=SI_KERNEL, si_addr=0} --- >>>>>>>>>>> 138551 0.105048 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>>>>> 138550 0.000041 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>>>>> 138547 0.000008 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>>>>> 138546 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>>>>> 138545 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>>>>> 138544 0.000008 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>>>>> 138543 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>>>>> >>>>>>>>>>> As for I understand that somehow gluster is trying to access >>>>>>>>>>> memory in appropriate manner and kernel sends SIGSEGV >>>>>>>>>>> >>>>>>>>>>> I also got the core dump. I am trying gdb first time so I am not >>>>>>>>>>> sure whether I am using it correctly >>>>>>>>>>> >>>>>>>>>>> gdb /usr/sbin/glusterfs core.138536 >>>>>>>>>>> >>>>>>>>>>> It just tell me that program terminated with signal 11, >>>>>>>>>>> segmentation fault . >>>>>>>>>>> >>>>>>>>>>> The problem is not limited to one client but happening to many >>>>>>>>>>> clients. >>>>>>>>>>> >>>>>>>>>>> I will really appreciate any help as whole file system has >>>>>>>>>>> become unusable >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Kashif >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Kashif, >>>>>>>>>>>> You can change the log level by: >>>>>>>>>>>> $ gluster volume set <vol> diagnostics.brick-log-level TRACE >>>>>>>>>>>> $ gluster volume set <vol> diagnostics.client-log-level TRACE >>>>>>>>>>>> >>>>>>>>>>>> and see how things fare >>>>>>>>>>>> >>>>>>>>>>>> If you want fewer logs you can change the log-level to DEBUG >>>>>>>>>>>> instead of TRACE. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Vijay >>>>>>>>>>>>> >>>>>>>>>>>>> Now it is unmounting every 30 mins ! >>>>>>>>>>>>> >>>>>>>>>>>>> The server log at >>>>>>>>>>>>> /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log >>>>>>>>>>>>> have this line only >>>>>>>>>>>>> >>>>>>>>>>>>> 2018-06-12 09:53:19.303102] I [MSGID: 115013] >>>>>>>>>>>>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd >>>>>>>>>>>>> cleanup on /atlas/atlasdata/zgubic/hmumu/ >>>>>>>>>>>>> histograms/v14.3/Signal >>>>>>>>>>>>> [2018-06-12 09:53:19.306190] I [MSGID: 101055] >>>>>>>>>>>>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: >>>>>>>>>>>>> Shutting down connection <server-name> >>>>>>>>>>>>> -2224879-2018/06/12-09:51:01:4 >>>>>>>>>>>>> 60889-atlasglust-client-0-0-0 >>>>>>>>>>>>> >>>>>>>>>>>>> There is no other information. Is there any way to increase >>>>>>>>>>>>> log verbosity? >>>>>>>>>>>>> >>>>>>>>>>>>> on the client >>>>>>>>>>>>> >>>>>>>>>>>>> 2018-06-12 09:51:01.744980] I [MSGID: 114057] >>>>>>>>>>>>> [client-handshake.c:1478:select_server_supported_programs] >>>>>>>>>>>>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num >>>>>>>>>>>>> (1298437), Version >>>>>>>>>>>>> (330) >>>>>>>>>>>>> [2018-06-12 09:51:01.746508] I [MSGID: 114046] >>>>>>>>>>>>> [client-handshake.c:1231:client_setvolume_cbk] >>>>>>>>>>>>> 0-atlasglust-client-5: Connected to atlasglust-client-5, attached >>>>>>>>>>>>> to remote >>>>>>>>>>>>> volume '/glusteratlas/brick006/gv0'. >>>>>>>>>>>>> [2018-06-12 09:51:01.746543] I [MSGID: 114047] >>>>>>>>>>>>> [client-handshake.c:1242:client_setvolume_cbk] >>>>>>>>>>>>> 0-atlasglust-client-5: Server and Client lk-version numbers are >>>>>>>>>>>>> not same, >>>>>>>>>>>>> reopening the fds >>>>>>>>>>>>> [2018-06-12 09:51:01.746814] I [MSGID: 114035] >>>>>>>>>>>>> [client-handshake.c:202:client_set_lk_version_cbk] >>>>>>>>>>>>> 0-atlasglust-client-5: Server lk version = 1 >>>>>>>>>>>>> [2018-06-12 09:51:01.748449] I [MSGID: 114057] >>>>>>>>>>>>> [client-handshake.c:1478:select_server_supported_programs] >>>>>>>>>>>>> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num >>>>>>>>>>>>> (1298437), Version >>>>>>>>>>>>> (330) >>>>>>>>>>>>> [2018-06-12 09:51:01.750219] I [MSGID: 114046] >>>>>>>>>>>>> [client-handshake.c:1231:client_setvolume_cbk] >>>>>>>>>>>>> 0-atlasglust-client-6: Connected to atlasglust-client-6, attached >>>>>>>>>>>>> to remote >>>>>>>>>>>>> volume '/glusteratlas/brick007/gv0'. >>>>>>>>>>>>> [2018-06-12 09:51:01.750261] I [MSGID: 114047] >>>>>>>>>>>>> [client-handshake.c:1242:client_setvolume_cbk] >>>>>>>>>>>>> 0-atlasglust-client-6: Server and Client lk-version numbers are >>>>>>>>>>>>> not same, >>>>>>>>>>>>> reopening the fds >>>>>>>>>>>>> [2018-06-12 09:51:01.750503] I [MSGID: 114035] >>>>>>>>>>>>> [client-handshake.c:202:client_set_lk_version_cbk] >>>>>>>>>>>>> 0-atlasglust-client-6: Server lk version = 1 >>>>>>>>>>>>> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init] >>>>>>>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs >>>>>>>>>>>>> 7.24 kernel >>>>>>>>>>>>> 7.14 >>>>>>>>>>>>> [2018-06-12 09:51:01.752261] I >>>>>>>>>>>>> [fuse-bridge.c:4835:fuse_graph_sync] >>>>>>>>>>>>> 0-fuse: switched to graph 0 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> is there a problem with server and client 1k version? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for your help. >>>>>>>>>>>>> >>>>>>>>>>>>> Kashif >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Since I have updated our gluster server and client to latest >>>>>>>>>>>>>>> version 3.12.9-1, I am having this issue of gluster getting >>>>>>>>>>>>>>> unmounted from >>>>>>>>>>>>>>> client very regularly. It was not a problem before update. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Its a distributed file system with no replication. We have >>>>>>>>>>>>>>> seven servers totaling around 480TB data. Its 97% full. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am using following config on server >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> gluster volume set atlasglust features.cache-invalidation on >>>>>>>>>>>>>>> gluster volume set atlasglust >>>>>>>>>>>>>>> features.cache-invalidation-timeout >>>>>>>>>>>>>>> 600 >>>>>>>>>>>>>>> gluster volume set atlasglust performance.stat-prefetch on >>>>>>>>>>>>>>> gluster volume set atlasglust performance.cache-invalidation >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> gluster volume set atlasglust performance.md-cache-timeout >>>>>>>>>>>>>>> 600 >>>>>>>>>>>>>>> gluster volume set atlasglust performance.parallel-readdir on >>>>>>>>>>>>>>> gluster volume set atlasglust performance.cache-size 1GB >>>>>>>>>>>>>>> gluster volume set atlasglust performance.client-io-threads >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> gluster volume set atlasglust cluster.lookup-optimize on >>>>>>>>>>>>>>> gluster volume set atlasglust performance.stat-prefetch on >>>>>>>>>>>>>>> gluster volume set atlasglust client.event-threads 4 >>>>>>>>>>>>>>> gluster volume set atlasglust server.event-threads 4 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> clients are mounted with this option >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> defaults,direct-io-mode=disable,attribute-timeout=600,entry- >>>>>>>>>>>>>>> timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I can't see anything in the log file. Can someone suggest >>>>>>>>>>>>>>> that how to troubleshoot this issue? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you please share the log file? Checking for messages >>>>>>>>>>>>>> related to disconnections/crashes in the log file would be a >>>>>>>>>>>>>> good way to >>>>>>>>>>>>>> start troubleshooting the problem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Vijay >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Milind >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Milind >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Milind >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Milind >>>> >>>> >>> >> >> >> -- >> Milind >> >> >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
