Hi Nithya, I can try to disable write-behind as long as it doesn't heavily impact performance for us. Which option is it exactly? I don't see it set in my list of changed volume variables that I sent you guys earlier.
Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <[email protected]> wrote: > Hi Artem, > > We have found the cause of one crash. Unfortunately we have not managed to > reproduce the one you reported so we don't know if it is the same cause. > > Can you disable write-behind on the volume and let us know if it solves > the problem? If yes, it is likely to be the same issue. > > > regards, > Nithya > > On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <[email protected]> > wrote: > >> Sorry to disappoint, but the crash just happened again, so lru-limit=0 >> didn't help. >> >> Here's the snippet of the crash and the subsequent remount by monit. >> >> >> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >> [0x7f4402b99329] >> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >> valid argument] >> The message "I [MSGID: 108031] >> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: >> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between >> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >> The message "E [MSGID: 101191] >> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >> [2019-02-08 01:13:09.311554] >> pending frames: >> frame : type(1) op(LOOKUP) >> frame : type(0) op(0) >> patchset: git://git.gluster.org/glusterfs.git >> signal received: 6 >> time of crash: >> 2019-02-08 01:13:09 >> configuration details: >> argp 1 >> backtrace 1 >> dlfcn 1 >> libpthread 1 >> llistxattr 1 >> setfsid 1 >> spinlock 1 >> epoll.h 1 >> xattr.h 1 >> st_atim.tv_nsec 1 >> package-string: glusterfs 5.3 >> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >> >> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >> >> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >> >> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >> --------- >> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 >> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse >> --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1) >> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 1 >> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 2 >> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 3 >> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 4 >> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] >> 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect >> on transport >> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] >> 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect >> on transport >> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] >> 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect >> on transport >> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] >> 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect >> on transport >> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] >> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0) >> Final graph: >> >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >> <http://twitter.com/ArtemR> >> >> >> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <[email protected]> >> wrote: >> >>> I've added the lru-limit=0 parameter to the mounts, and I see it's taken >>> effect correctly: >>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>> >>> Let's see if it stops crashing or not. >>> >>> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>> <http://www.apkmirror.com/>, Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>> <http://twitter.com/ArtemR> >>> >>> >>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <[email protected]> >>> wrote: >>> >>>> Hi Nithya, >>>> >>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing >>>> crashes, and no further releases have been made yet. >>>> >>>> volume info: >>>> Type: Replicate >>>> Volume ID: ****SNIP**** >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 1 x 4 = 4 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: ****SNIP**** >>>> Brick2: ****SNIP**** >>>> Brick3: ****SNIP**** >>>> Brick4: ****SNIP**** >>>> Options Reconfigured: >>>> cluster.quorum-count: 1 >>>> cluster.quorum-type: fixed >>>> network.ping-timeout: 5 >>>> network.remote-dio: enable >>>> performance.rda-cache-limit: 256MB >>>> performance.readdir-ahead: on >>>> performance.parallel-readdir: on >>>> network.inode-lru-limit: 500000 >>>> performance.md-cache-timeout: 600 >>>> performance.cache-invalidation: on >>>> performance.stat-prefetch: on >>>> features.cache-invalidation-timeout: 600 >>>> features.cache-invalidation: on >>>> cluster.readdir-optimize: on >>>> performance.io-thread-count: 32 >>>> server.event-threads: 4 >>>> client.event-threads: 4 >>>> performance.read-ahead: off >>>> cluster.lookup-optimize: on >>>> performance.cache-size: 1GB >>>> cluster.self-heal-daemon: enable >>>> transport.address-family: inet >>>> nfs.disable: on >>>> performance.client-io-threads: on >>>> cluster.granular-entry-heal: enable >>>> cluster.data-self-heal-algorithm: full >>>> >>>> Sincerely, >>>> Artem >>>> >>>> -- >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>> beerpla.net | +ArtemRussakovskii >>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>> <http://twitter.com/ArtemR> >>>> >>>> >>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>> [email protected]> wrote: >>>> >>>>> Hi Artem, >>>>> >>>>> Do you still see the crashes with 5.3? If yes, please try mount the >>>>> volume using the mount option lru-limit=0 and see if that helps. We are >>>>> looking into the crashes and will update when have a fix. >>>>> >>>>> Also, please provide the gluster volume info for the volume in >>>>> question. >>>>> >>>>> >>>>> regards, >>>>> Nithya >>>>> >>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <[email protected]> >>>>> wrote: >>>>> >>>>>> The fuse crash happened two more times, but this time monit helped >>>>>> recover within 1 minute, so it's a great workaround for now. >>>>>> >>>>>> What's odd is that the crashes are only happening on one of 4 >>>>>> servers, and I don't know why. >>>>>> >>>>>> Sincerely, >>>>>> Artem >>>>>> >>>>>> -- >>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>> beerpla.net | +ArtemRussakovskii >>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>> <http://twitter.com/ArtemR> >>>>>> >>>>>> >>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> The fuse crash happened again yesterday, to another volume. Are >>>>>>> there any mount options that could help mitigate this? >>>>>>> >>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) task >>>>>>> to watch and restart the mount, which works and recovers the mount point >>>>>>> within a minute. Not ideal, but a temporary workaround. >>>>>>> >>>>>>> By the way, the way to reproduce this "Transport endpoint is not >>>>>>> connected" condition for testing purposes is to kill -9 the right >>>>>>> "glusterfs --process-name fuse" process. >>>>>>> >>>>>>> >>>>>>> monit check: >>>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1 >>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>> >>>>>>> >>>>>>> stack trace: >>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>> [0x7fa0249e4329] >>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>> [0x7fa0249e4329] >>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>> The message "E [MSGID: 101191] >>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to >>>>>>> dispatch >>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and >>>>>>> [2019-02-01 23:21:56.164427] >>>>>>> The message "I [MSGID: 108031] >>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: >>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times >>>>>>> between >>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>> pending frames: >>>>>>> frame : type(1) op(LOOKUP) >>>>>>> frame : type(0) op(0) >>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>> signal received: 6 >>>>>>> time of crash: >>>>>>> 2019-02-01 23:22:03 >>>>>>> configuration details: >>>>>>> argp 1 >>>>>>> backtrace 1 >>>>>>> dlfcn 1 >>>>>>> libpthread 1 >>>>>>> llistxattr 1 >>>>>>> setfsid 1 >>>>>>> spinlock 1 >>>>>>> epoll.h 1 >>>>>>> xattr.h 1 >>>>>>> st_atim.tv_nsec 1 >>>>>>> package-string: glusterfs 5.3 >>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>>> >>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> The first (and so far only) crash happened at 2am the next day >>>>>>>> after we upgraded, on only one of four servers and only to one of two >>>>>>>> mounts. >>>>>>>> >>>>>>>> I have no idea what caused it, but yeah, we do have a pretty busy >>>>>>>> site (apkmirror.com), and it caused a disruption for any uploads >>>>>>>> or downloads from that server until I woke up and fixed the mount. >>>>>>>> >>>>>>>> I wish I could be more helpful but all I have is that stack trace. >>>>>>>> >>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon. >>>>>>>> >>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Artem, >>>>>>>>> >>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, >>>>>>>>> as a clone of other bugs where recent discussions happened), and >>>>>>>>> marked it >>>>>>>>> as a blocker for glusterfs-5.4 release. >>>>>>>>> >>>>>>>>> We already have fixes for log flooding - >>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>> >>>>>>>>> Can you please tell if the crashes happened as soon as upgrade ? >>>>>>>>> or was there any particular pattern you observed before the crash. >>>>>>>>> >>>>>>>>> -Amar >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I >>>>>>>>>> already got a crash which others have mentioned in >>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to >>>>>>>>>> unmount, kill gluster, and remount: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: >>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times >>>>>>>>>> between >>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061] >>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to >>>>>>>>>> dispatch >>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and >>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>> pending frames: >>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>> frame : type(0) op(0) >>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>> signal received: 6 >>>>>>>>>> time of crash: >>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>> configuration details: >>>>>>>>>> argp 1 >>>>>>>>>> backtrace 1 >>>>>>>>>> dlfcn 1 >>>>>>>>>> libpthread 1 >>>>>>>>>> llistxattr 1 >>>>>>>>>> setfsid 1 >>>>>>>>>> spinlock 1 >>>>>>>>>> epoll.h 1 >>>>>>>>>> xattr.h 1 >>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>> >>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>> >>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>> --------- >>>>>>>>>> >>>>>>>>>> Do the pending patches fix the crash or only the repeated >>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via >>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>> >>>>>>>>>> If it's not fixed by the patches above, has anyone already opened >>>>>>>>>> a ticket for the crashes that I can join and monitor? This is going >>>>>>>>>> to >>>>>>>>>> create a massive problem for us since production systems are >>>>>>>>>> crashing. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Sincerely, >>>>>>>>>> Artem >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Also, not sure if related or not, but I got a ton of these >>>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have >>>>>>>>>>>> been >>>>>>>>>>>> commenting about this issue here >>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses >>>>>>>>>>> this. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>> ==> mnt-SITE_data3.log <== >>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed >>>>>>>>>>>>> to dispatch >>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] >>>>>>>>>>>>> and >>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>> 2-SITE_data3-replicate-0: >>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times >>>>>>>>>>>>> between >>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306] >>>>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>> 2-SITE_data1-replicate-0: >>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times >>>>>>>>>>>>> between >>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789] >>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed >>>>>>>>>>>>> to dispatch >>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] >>>>>>>>>>>>> and >>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>> 2-SITE_data1-replicate-0: >>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>> ==> mnt-SITE_data3.log <== >>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>> 2-SITE_data3-replicate-0: >>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed >>>>>>>>>>>>> to dispatch >>>>>>>>>>>>> handler >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I'm hoping raising the issue here on the mailing list may bring >>>>>>>>>>>> some additional eyeballs and get them both fixed. >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> Sincerely, >>>>>>>>>>>> Artem >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's >>>>>>>>>>>>> a comment from 3 days ago from someone else with 5.3 who started >>>>>>>>>>>>> seeing the >>>>>>>>>>>>> spam. >>>>>>>>>>>>> >>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> +Milind Changire <[email protected]> Can you check why this >>>>>>>>>>> message is logged and send a fix? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>> Artem >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Amar Tumballi (amarts) >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> [email protected] >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>>
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
