Hi Milind Thanks a lot, I manage to run gdb and produced traceback as well. Its here
http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log I am trying to understand but still not able to make sense out of it. Thanks Kashif On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire <[email protected]> wrote: > Kashif, > FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/ > > > On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif <[email protected]> > wrote: > >> Hi Milind >> >> There is no glusterfs-debuginfo available for gluster-3.12 from >> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do >> you know from where I can get it? >> Also when I run gdb, it says >> >> Missing separate debuginfos, use: debuginfo-install >> glusterfs-fuse-3.12.9-1.el6.x86_64 >> >> I can't find debug package for glusterfs-fuse either >> >> Thanks from the pit of despair ;) >> >> Kashif >> >> >> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif <[email protected]> >> wrote: >> >>> Hi Milind >>> >>> I will send you links for logs. >>> >>> I collected these core dumps at client and there is no glusterd process >>> running on client. >>> >>> Kashif >>> >>> >>> >>> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire <[email protected]> >>> wrote: >>> >>>> Kashif, >>>> Could you also send over the client/mount log file as Vijay suggested ? >>>> Or maybe the lines with the crash backtrace lines >>>> >>>> Also, you've mentioned that you straced glusterd, but when you ran gdb, >>>> you ran it over /usr/sbin/glusterfs >>>> >>>> >>>> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur <[email protected]> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Milind >>>>>> >>>>>> The operating system is Scientific Linux 6 which is based on RHEL6. >>>>>> The cpu arch is Intel x86_64. >>>>>> >>>>>> I will send you a separate email with link to core dump. >>>>>> >>>>> >>>>> >>>>> You could also grep for crash in the client log file and the lines >>>>> following crash would have a backtrace in most cases. >>>>> >>>>> HTH, >>>>> Vijay >>>>> >>>>> >>>>>> >>>>>> Thanks for your help. >>>>>> >>>>>> Kashif >>>>>> >>>>>> >>>>>> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> Kashif, >>>>>>> Could you share the core dump via Google Drive or something similar >>>>>>> >>>>>>> Also, let me know the CPU arch and OS Distribution on which you are >>>>>>> running gluster. >>>>>>> >>>>>>> If you've installed the glusterfs-debuginfo package, you'll also get >>>>>>> the source lines in the backtrace via gdb >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Milind, Vijay >>>>>>>> >>>>>>>> Thanks, I have some more information now as I straced glusterd on >>>>>>>> client >>>>>>>> >>>>>>>> 138544 0.000131 mprotect(0x7f2f70785000, 4096, >>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000026> >>>>>>>> 138544 0.000128 mprotect(0x7f2f70786000, 4096, >>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000027> >>>>>>>> 138544 0.000126 mprotect(0x7f2f70787000, 4096, >>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000027> >>>>>>>> 138544 0.000124 --- SIGSEGV {si_signo=SIGSEGV, >>>>>>>> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} --- >>>>>>>> 138544 0.000051 --- SIGSEGV {si_signo=SIGSEGV, >>>>>>>> si_code=SI_KERNEL, si_addr=0} --- >>>>>>>> 138551 0.105048 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>> 138550 0.000041 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>> 138547 0.000008 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>> 138546 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>> 138545 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>> 138544 0.000008 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>> 138543 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>>>>>> >>>>>>>> As for I understand that somehow gluster is trying to access memory >>>>>>>> in appropriate manner and kernel sends SIGSEGV >>>>>>>> >>>>>>>> I also got the core dump. I am trying gdb first time so I am not >>>>>>>> sure whether I am using it correctly >>>>>>>> >>>>>>>> gdb /usr/sbin/glusterfs core.138536 >>>>>>>> >>>>>>>> It just tell me that program terminated with signal 11, >>>>>>>> segmentation fault . >>>>>>>> >>>>>>>> The problem is not limited to one client but happening to many >>>>>>>> clients. >>>>>>>> >>>>>>>> I will really appreciate any help as whole file system has become >>>>>>>> unusable >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Kashif >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Kashif, >>>>>>>>> You can change the log level by: >>>>>>>>> $ gluster volume set <vol> diagnostics.brick-log-level TRACE >>>>>>>>> $ gluster volume set <vol> diagnostics.client-log-level TRACE >>>>>>>>> >>>>>>>>> and see how things fare >>>>>>>>> >>>>>>>>> If you want fewer logs you can change the log-level to DEBUG >>>>>>>>> instead of TRACE. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Vijay >>>>>>>>>> >>>>>>>>>> Now it is unmounting every 30 mins ! >>>>>>>>>> >>>>>>>>>> The server log at >>>>>>>>>> /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log >>>>>>>>>> have this line only >>>>>>>>>> >>>>>>>>>> 2018-06-12 09:53:19.303102] I [MSGID: 115013] >>>>>>>>>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd >>>>>>>>>> cleanup on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal >>>>>>>>>> [2018-06-12 09:53:19.306190] I [MSGID: 101055] >>>>>>>>>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting >>>>>>>>>> down connection <server-name> -2224879-2018/06/12-09:51:01:4 >>>>>>>>>> 60889-atlasglust-client-0-0-0 >>>>>>>>>> >>>>>>>>>> There is no other information. Is there any way to increase log >>>>>>>>>> verbosity? >>>>>>>>>> >>>>>>>>>> on the client >>>>>>>>>> >>>>>>>>>> 2018-06-12 09:51:01.744980] I [MSGID: 114057] >>>>>>>>>> [client-handshake.c:1478:select_server_supported_programs] >>>>>>>>>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), >>>>>>>>>> Version >>>>>>>>>> (330) >>>>>>>>>> [2018-06-12 09:51:01.746508] I [MSGID: 114046] >>>>>>>>>> [client-handshake.c:1231:client_setvolume_cbk] >>>>>>>>>> 0-atlasglust-client-5: Connected to atlasglust-client-5, attached to >>>>>>>>>> remote >>>>>>>>>> volume '/glusteratlas/brick006/gv0'. >>>>>>>>>> [2018-06-12 09:51:01.746543] I [MSGID: 114047] >>>>>>>>>> [client-handshake.c:1242:client_setvolume_cbk] >>>>>>>>>> 0-atlasglust-client-5: Server and Client lk-version numbers are not >>>>>>>>>> same, >>>>>>>>>> reopening the fds >>>>>>>>>> [2018-06-12 09:51:01.746814] I [MSGID: 114035] >>>>>>>>>> [client-handshake.c:202:client_set_lk_version_cbk] >>>>>>>>>> 0-atlasglust-client-5: Server lk version = 1 >>>>>>>>>> [2018-06-12 09:51:01.748449] I [MSGID: 114057] >>>>>>>>>> [client-handshake.c:1478:select_server_supported_programs] >>>>>>>>>> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), >>>>>>>>>> Version >>>>>>>>>> (330) >>>>>>>>>> [2018-06-12 09:51:01.750219] I [MSGID: 114046] >>>>>>>>>> [client-handshake.c:1231:client_setvolume_cbk] >>>>>>>>>> 0-atlasglust-client-6: Connected to atlasglust-client-6, attached to >>>>>>>>>> remote >>>>>>>>>> volume '/glusteratlas/brick007/gv0'. >>>>>>>>>> [2018-06-12 09:51:01.750261] I [MSGID: 114047] >>>>>>>>>> [client-handshake.c:1242:client_setvolume_cbk] >>>>>>>>>> 0-atlasglust-client-6: Server and Client lk-version numbers are not >>>>>>>>>> same, >>>>>>>>>> reopening the fds >>>>>>>>>> [2018-06-12 09:51:01.750503] I [MSGID: 114035] >>>>>>>>>> [client-handshake.c:202:client_set_lk_version_cbk] >>>>>>>>>> 0-atlasglust-client-6: Server lk version = 1 >>>>>>>>>> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init] >>>>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 >>>>>>>>>> kernel >>>>>>>>>> 7.14 >>>>>>>>>> [2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync] >>>>>>>>>> 0-fuse: switched to graph 0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> is there a problem with server and client 1k version? >>>>>>>>>> >>>>>>>>>> Thanks for your help. >>>>>>>>>> >>>>>>>>>> Kashif >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> >>>>>>>>>>>> Since I have updated our gluster server and client to latest >>>>>>>>>>>> version 3.12.9-1, I am having this issue of gluster getting >>>>>>>>>>>> unmounted from >>>>>>>>>>>> client very regularly. It was not a problem before update. >>>>>>>>>>>> >>>>>>>>>>>> Its a distributed file system with no replication. We have >>>>>>>>>>>> seven servers totaling around 480TB data. Its 97% full. >>>>>>>>>>>> >>>>>>>>>>>> I am using following config on server >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> gluster volume set atlasglust features.cache-invalidation on >>>>>>>>>>>> gluster volume set atlasglust features.cache-invalidation-timeout >>>>>>>>>>>> 600 >>>>>>>>>>>> gluster volume set atlasglust performance.stat-prefetch on >>>>>>>>>>>> gluster volume set atlasglust performance.cache-invalidation on >>>>>>>>>>>> gluster volume set atlasglust performance.md-cache-timeout 600 >>>>>>>>>>>> gluster volume set atlasglust performance.parallel-readdir on >>>>>>>>>>>> gluster volume set atlasglust performance.cache-size 1GB >>>>>>>>>>>> gluster volume set atlasglust performance.client-io-threads on >>>>>>>>>>>> gluster volume set atlasglust cluster.lookup-optimize on >>>>>>>>>>>> gluster volume set atlasglust performance.stat-prefetch on >>>>>>>>>>>> gluster volume set atlasglust client.event-threads 4 >>>>>>>>>>>> gluster volume set atlasglust server.event-threads 4 >>>>>>>>>>>> >>>>>>>>>>>> clients are mounted with this option >>>>>>>>>>>> >>>>>>>>>>>> defaults,direct-io-mode=disable,attribute-timeout=600,entry- >>>>>>>>>>>> timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev >>>>>>>>>>>> >>>>>>>>>>>> I can't see anything in the log file. Can someone suggest that >>>>>>>>>>>> how to troubleshoot this issue? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Can you please share the log file? Checking for messages related >>>>>>>>>>> to disconnections/crashes in the log file would be a good way to >>>>>>>>>>> start >>>>>>>>>>> troubleshooting the problem. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Vijay >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Milind >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Milind >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Milind >>>> >>>> >>> >> > > > -- > Milind > >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
