On Fri, Mar 10, 2017 at 12:50 PM, Sergei Gerasenko <[email protected]> wrote:
> I see why it's not saving the cores: the package isn't signed with the > right signature. I will modify the abrd configs to change that behavior and > wait for the next crash. > Ok, thanks. Please let us know when you get hold of the next core. -Vijay > > On Fri, Mar 10, 2017 at 11:23 AM, Vijay Bellur <[email protected]> wrote: > >> >> >> On Fri, Mar 10, 2017 at 11:17 AM, Sergei Gerasenko <[email protected]> >> wrote: >> >>> Hi, >>> >>> I'm running gluster 3.7.12. It's an 8-node distributed, replicated >>> cluster (replica 2). It's had been working fine for a long time when all of >>> a sudden I started seeing bricks going offline. Researching further I found >>> messages like this: >>> >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: pending frames: >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: frame : type(0) >>> op(5) >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: patchset: git:// >>> git.gluster.com/glusterfs.git >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: signal received: >>> 6 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: time of crash: >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: 2017-03-10 >>> 05:02:12 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: configuration >>> details: >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: argp 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: backtrace 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: dlfcn 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: libpthread 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: llistxattr 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: setfsid 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: spinlock 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: epoll.h 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: xattr.h 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: st_atim.tv_nsec 1 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: package-string: >>> glusterfs 3.7.12 >>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: --------- >>> >>> I initially thought it was related to quota support (based on some >>> googling), so I turned off quota and also disabled NFS support to simplify >>> the debugging. Every time after the crash, I restarted gluster and the >>> bricks would go online for several hours only to crash again later. There >>> are lots of messages like this preceding the crash: >>> >>> ... >>> [2017-03-10 04:40:46.002225] E [MSGID: 113091] >>> [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null) >>> [2017-03-10 04:40:46.002278] E [MSGID: 113018] >>> [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed >>> [Invalid argument] >>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup] >>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between >>> [2017-03-10 04:40:46.002225] and [2017-03-10 04:40:46.005699] >>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup] >>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3 >>> times between [2017-03-10 04:40:46.002278] and [2017-03-10 04:40:46.005701] >>> [2017-03-10 04:50:47.002170] E [MSGID: 113091] >>> [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null) >>> [2017-03-10 04:50:47.002219] E [MSGID: 113018] >>> [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed >>> [Invalid argument] >>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup] >>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between >>> [2017-03-10 04:50:47.002170] and [2017-03-10 04:50:47.005623] >>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup] >>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3 >>> times between [2017-03-10 04:50:47.002219] and [2017-03-10 04:50:47.005625] >>> [2017-03-10 05:00:48.002246] E [MSGID: 113091] >>> [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null) >>> [2017-03-10 05:00:48.002314] E [MSGID: 113018] >>> [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed >>> [Invalid argument] >>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup] >>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between >>> [2017-03-10 05:00:48.002246] and [2017-03-10 05:00:48.005828] >>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup] >>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3 >>> times between [2017-03-10 05:00:48.002314] and [2017-03-10 05:00:48.005830] >>> >>> One important detail I noticed yesterday is that one of the nodes was >>> running gluster version 3.7.13! I'm not sure what did the upgrade. So I >>> downgraded to 3.7.12 and restarted gluster. The crash above happened >>> several hours later. But again, the crashes had been happening before the >>> downgrade -- possibly because of the version mismatch on one of the nodes. >>> >>> Anybody have any ideas? >>> >>> >> >> Do you have the core files from the crashes? If so, can you please >> provide a gdb backtrace from one of the core files? >> >> Thanks, >> Vijay >> > >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
