Thanks Artem. Can you send us the coredump or the bt with symbols from the crash?
Regards, Nithya On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon...@gmail.com> wrote: > Sorry to disappoint, but the crash just happened again, so lru-limit=0 > didn't help. > > Here's the snippet of the crash and the subsequent remount by monit. > > > [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] > (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) > [0x7f4402b99329] > -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) > [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) > [0x7f440b6b5218] ) 0-dict: dict is NULL [In > valid argument] > The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] > 0-<SNIP>_data1-replicate-0: selecting local read_child > <SNIP>_data1-client-3" repeated 39 times between [2019-02-08 > 01:11:18.043286] and [2019-02-08 01:13:07.915604] > The message "E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler" repeated 515 times between [2019-02-08 01:11:17.932515] and > [2019-02-08 01:13:09.311554] > pending frames: > frame : type(1) op(LOOKUP) > frame : type(0) op(0) > patchset: git://git.gluster.org/glusterfs.git > signal received: 6 > time of crash: > 2019-02-08 01:13:09 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 5.3 > /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] > /lib64/libc.so.6(+0x36160)[0x7f440a887160] > /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] > /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] > /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] > /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] > /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] > > /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] > > /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] > > /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] > /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] > /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] > /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] > /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] > /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] > /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] > --------- > [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 > (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse > --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1) > [2019-02-08 01:13:35.637830] I [MSGID: 101190] > [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-02-08 01:13:35.651405] I [MSGID: 101190] > [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 2 > [2019-02-08 01:13:35.651628] I [MSGID: 101190] > [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 3 > [2019-02-08 01:13:35.651747] I [MSGID: 101190] > [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 4 > [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] > 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect > on transport > [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] > 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect > on transport > [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] > 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect > on transport > [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] > 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect > on transport > [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] > 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) > Final graph: > > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > <http://twitter.com/ArtemR> > > > On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon...@gmail.com> > wrote: > >> I've added the lru-limit=0 parameter to the mounts, and I see it's taken >> effect correctly: >> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >> >> Let's see if it stops crashing or not. >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >> <http://twitter.com/ArtemR> >> >> >> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <archon...@gmail.com> >> wrote: >> >>> Hi Nithya, >>> >>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing >>> crashes, and no further releases have been made yet. >>> >>> volume info: >>> Type: Replicate >>> Volume ID: ****SNIP**** >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x 4 = 4 >>> Transport-type: tcp >>> Bricks: >>> Brick1: ****SNIP**** >>> Brick2: ****SNIP**** >>> Brick3: ****SNIP**** >>> Brick4: ****SNIP**** >>> Options Reconfigured: >>> cluster.quorum-count: 1 >>> cluster.quorum-type: fixed >>> network.ping-timeout: 5 >>> network.remote-dio: enable >>> performance.rda-cache-limit: 256MB >>> performance.readdir-ahead: on >>> performance.parallel-readdir: on >>> network.inode-lru-limit: 500000 >>> performance.md-cache-timeout: 600 >>> performance.cache-invalidation: on >>> performance.stat-prefetch: on >>> features.cache-invalidation-timeout: 600 >>> features.cache-invalidation: on >>> cluster.readdir-optimize: on >>> performance.io-thread-count: 32 >>> server.event-threads: 4 >>> client.event-threads: 4 >>> performance.read-ahead: off >>> cluster.lookup-optimize: on >>> performance.cache-size: 1GB >>> cluster.self-heal-daemon: enable >>> transport.address-family: inet >>> nfs.disable: on >>> performance.client-io-threads: on >>> cluster.granular-entry-heal: enable >>> cluster.data-self-heal-algorithm: full >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>> <http://twitter.com/ArtemR> >>> >>> >>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <nbala...@redhat.com> >>> wrote: >>> >>>> Hi Artem, >>>> >>>> Do you still see the crashes with 5.3? If yes, please try mount the >>>> volume using the mount option lru-limit=0 and see if that helps. We are >>>> looking into the crashes and will update when have a fix. >>>> >>>> Also, please provide the gluster volume info for the volume in question. >>>> >>>> >>>> regards, >>>> Nithya >>>> >>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon...@gmail.com> >>>> wrote: >>>> >>>>> The fuse crash happened two more times, but this time monit helped >>>>> recover within 1 minute, so it's a great workaround for now. >>>>> >>>>> What's odd is that the crashes are only happening on one of 4 servers, >>>>> and I don't know why. >>>>> >>>>> Sincerely, >>>>> Artem >>>>> >>>>> -- >>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>> beerpla.net | +ArtemRussakovskii >>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>> <http://twitter.com/ArtemR> >>>>> >>>>> >>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>> archon...@gmail.com> wrote: >>>>> >>>>>> The fuse crash happened again yesterday, to another volume. Are there >>>>>> any mount options that could help mitigate this? >>>>>> >>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) task >>>>>> to watch and restart the mount, which works and recovers the mount point >>>>>> within a minute. Not ideal, but a temporary workaround. >>>>>> >>>>>> By the way, the way to reproduce this "Transport endpoint is not >>>>>> connected" condition for testing purposes is to kill -9 the right >>>>>> "glusterfs --process-name fuse" process. >>>>>> >>>>>> >>>>>> monit check: >>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>> then alert else if succeeded for 10 cycles then alert >>>>>> >>>>>> >>>>>> stack trace: >>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>> [0x7fa0249e4329] >>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>> [0x7fa0249e4329] >>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>> The message "E [MSGID: 101191] >>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to >>>>>> dispatch >>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>> [2019-02-01 23:21:56.164427] >>>>>> The message "I [MSGID: 108031] >>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times between >>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>> pending frames: >>>>>> frame : type(1) op(LOOKUP) >>>>>> frame : type(0) op(0) >>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>> signal received: 6 >>>>>> time of crash: >>>>>> 2019-02-01 23:22:03 >>>>>> configuration details: >>>>>> argp 1 >>>>>> backtrace 1 >>>>>> dlfcn 1 >>>>>> libpthread 1 >>>>>> llistxattr 1 >>>>>> setfsid 1 >>>>>> spinlock 1 >>>>>> epoll.h 1 >>>>>> xattr.h 1 >>>>>> st_atim.tv_nsec 1 >>>>>> package-string: glusterfs 5.3 >>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>> >>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> >>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>> archon...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> The first (and so far only) crash happened at 2am the next day after >>>>>>> we upgraded, on only one of four servers and only to one of two mounts. >>>>>>> >>>>>>> I have no idea what caused it, but yeah, we do have a pretty busy >>>>>>> site (apkmirror.com), and it caused a disruption for any uploads or >>>>>>> downloads from that server until I woke up and fixed the mount. >>>>>>> >>>>>>> I wish I could be more helpful but all I have is that stack trace. >>>>>>> >>>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>>> >>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>> atumb...@redhat.com> wrote: >>>>>>> >>>>>>>> Hi Artem, >>>>>>>> >>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as >>>>>>>> a clone of other bugs where recent discussions happened), and marked >>>>>>>> it as >>>>>>>> a blocker for glusterfs-5.4 release. >>>>>>>> >>>>>>>> We already have fixes for log flooding - >>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>> >>>>>>>> Can you please tell if the crashes happened as soon as upgrade ? or >>>>>>>> was there any particular pattern you observed before the crash. >>>>>>>> >>>>>>>> -Amar >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>> archon...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I >>>>>>>>> already got a crash which others have mentioned in >>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to >>>>>>>>> unmount, kill gluster, and remount: >>>>>>>>> >>>>>>>>> >>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>> [0x7fcccafcd329] >>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>> [0x7fcccafcd329] >>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>> [0x7fcccafcd329] >>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>> [0x7fcccafcd329] >>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times >>>>>>>>> between >>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to >>>>>>>>> dispatch >>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>> pending frames: >>>>>>>>> frame : type(1) op(READ) >>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>> frame : type(0) op(0) >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>> signal received: 6 >>>>>>>>> time of crash: >>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>> configuration details: >>>>>>>>> argp 1 >>>>>>>>> backtrace 1 >>>>>>>>> dlfcn 1 >>>>>>>>> libpthread 1 >>>>>>>>> llistxattr 1 >>>>>>>>> setfsid 1 >>>>>>>>> spinlock 1 >>>>>>>>> epoll.h 1 >>>>>>>>> xattr.h 1 >>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>> --------- >>>>>>>>> >>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>> not too sure how to make it core dump. >>>>>>>>> >>>>>>>>> If it's not fixed by the patches above, has anyone already opened >>>>>>>>> a ticket for the crashes that I can join and monitor? This is going to >>>>>>>>> create a massive problem for us since production systems are crashing. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Artem >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>> rgowd...@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>> archon...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Also, not sure if related or not, but I got a ton of these >>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have >>>>>>>>>>> been >>>>>>>>>>> commenting about this issue here >>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>> ==> mnt-SITE_data3.log <== >>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to >>>>>>>>>>>> dispatch >>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] >>>>>>>>>>>> and >>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>> 2-SITE_data3-replicate-0: >>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times >>>>>>>>>>>> between >>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>> 2-SITE_data1-replicate-0: >>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times >>>>>>>>>>>> between >>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to >>>>>>>>>>>> dispatch >>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] >>>>>>>>>>>> and >>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>> 2-SITE_data1-replicate-0: >>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>> ==> mnt-SITE_data3.log <== >>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>> 2-SITE_data3-replicate-0: >>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to >>>>>>>>>>>> dispatch >>>>>>>>>>>> handler >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm hoping raising the issue here on the mailing list may bring >>>>>>>>>>> some additional eyeballs and get them both fixed. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> Sincerely, >>>>>>>>>>> Artem >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>> archon...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a >>>>>>>>>>>> comment from 3 days ago from someone else with 5.3 who started >>>>>>>>>>>> seeing the >>>>>>>>>>>> spam. >>>>>>>>>>>> >>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> +Milind Changire <mchan...@redhat.com> Can you check why this >>>>>>>>>> message is logged and send a fix? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> Sincerely, >>>>>>>>>>>> Artem >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>> Gluster-users@gluster.org >>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users@gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Amar Tumballi (amarts) >>>>>>>> >>>>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users@gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>>
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users