Hi Nithya, Unfortunately, I just had another crash on the same server, with performance.write-behind still set to off. I'll email the core file privately.
[2019-02-19 19:50:39.511743] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7f9598991329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7f9598ba2af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7f95a137d218] ) 2-dict: dict is NULL [Invalid argument] The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 95 times between [2019-02-19 19:49:07.655620] and [2019-02-19 19:50:39.499284] The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-<SNIP>_data3-replicate-0: selecting local read_child <SNIP>_data3-client-3" repeated 56 times between [2019-02-19 19:49:07.602370] and [2019-02-19 19:50:42.912766] pending frames: frame : type(1) op(LOOKUP) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 6 time of crash: 2019-02-19 19:50:43 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 5.3 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f95a138864c] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f95a1392cb6] /lib64/libc.so.6(+0x36160)[0x7f95a054f160] /lib64/libc.so.6(gsignal+0x110)[0x7f95a054f0e0] /lib64/libc.so.6(abort+0x151)[0x7f95a05506c1] /lib64/libc.so.6(+0x2e6fa)[0x7f95a05476fa] /lib64/libc.so.6(+0x2e772)[0x7f95a0547772] /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f95a08dd0b8] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f95994f0c9d] /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f9599503ba1] /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f9599788f3f] /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f95a1153820] /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f95a1153b6f] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f95a1150063] /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f959aea00b2] /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f95a13e64c3] /lib64/libpthread.so.0(+0x7559)[0x7f95a08da559] /lib64/libc.so.6(clone+0x3f)[0x7f95a061181f] --------- [2019-02-19 19:51:34.425106] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data3 /mnt/<SNIP>_data3) [2019-02-19 19:51:34.435206] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-02-19 19:51:34.450272] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2019-02-19 19:51:34.450394] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 [2019-02-19 19:51:34.450488] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Tue, Feb 12, 2019 at 12:38 AM Nithya Balachandran <[email protected]> wrote: > > Not yet but we are discussing an interim release. It is going to take a > couple of days to review the fixes so not before then. We will update on > the list with dates once we decide. > > > On Tue, 12 Feb 2019 at 11:46, Artem Russakovskii <[email protected]> > wrote: > >> Awesome. But is there a release schedule and an ETA for when these will >> be out in the repos? >> >> On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa <[email protected]> >> wrote: >> >>> >>> >>> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii <[email protected]> >>> wrote: >>> >>>> Great job identifying the issue! >>>> >>>> Any ETA on the next release with the logging and crash fixes in it? >>>> >>> >>> I've marked write-behind corruption as a blocker for release-6. Logging >>> fixes are already in codebase. >>> >>> >>>> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa <[email protected]> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Feb 11, 2019 at 3:49 PM João Baúto < >>>>> [email protected]> wrote: >>>>> >>>>>> Although I don't have these error messages, I'm having fuse crashes >>>>>> as frequent as you. I have disabled write-behind and the mount has been >>>>>> running over the weekend with heavy usage and no issues. >>>>>> >>>>> >>>>> The issue you are facing will likely be fixed by patch [1]. Me, Xavi >>>>> and Nithya were able to identify the corruption in write-behind. >>>>> >>>>> [1] https://review.gluster.org/22189 >>>>> >>>>> >>>>>> I can provide coredumps before disabling write-behind if needed. I >>>>>> opened a BZ report >>>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1671014> with the >>>>>> crashes that I was having. >>>>>> >>>>>> *João Baúto* >>>>>> --------------- >>>>>> >>>>>> *Scientific Computing and Software Platform* >>>>>> Champalimaud Research >>>>>> Champalimaud Center for the Unknown >>>>>> Av. Brasília, Doca de Pedrouços >>>>>> 1400-038 Lisbon, Portugal >>>>>> fchampalimaud.org <https://www.fchampalimaud.org/> >>>>>> >>>>>> >>>>>> Artem Russakovskii <[email protected]> escreveu no dia sábado, >>>>>> 9/02/2019 à(s) 22:18: >>>>>> >>>>>>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting >>>>>>> for the next crash to see if it dumps a core for you guys to remotely >>>>>>> debug. >>>>>>> >>>>>>> Then I can consider setting performance.write-behind to off and >>>>>>> monitoring for further crashes. >>>>>>> >>>>>>> Sincerely, >>>>>>> Artem >>>>>>> >>>>>>> -- >>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>> <http://twitter.com/ArtemR> >>>>>>> >>>>>>> >>>>>>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Nithya, >>>>>>>>> >>>>>>>>> I can try to disable write-behind as long as it doesn't heavily >>>>>>>>> impact performance for us. Which option is it exactly? I don't see it >>>>>>>>> set >>>>>>>>> in my list of changed volume variables that I sent you guys earlier. >>>>>>>>> >>>>>>>> >>>>>>>> The option is performance.write-behind >>>>>>>> >>>>>>>> >>>>>>>>> Sincerely, >>>>>>>>> Artem >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Artem, >>>>>>>>>> >>>>>>>>>> We have found the cause of one crash. Unfortunately we have not >>>>>>>>>> managed to reproduce the one you reported so we don't know if it is >>>>>>>>>> the >>>>>>>>>> same cause. >>>>>>>>>> >>>>>>>>>> Can you disable write-behind on the volume and let us know if it >>>>>>>>>> solves the problem? If yes, it is likely to be the same issue. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> regards, >>>>>>>>>> Nithya >>>>>>>>>> >>>>>>>>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Sorry to disappoint, but the crash just happened again, so >>>>>>>>>>> lru-limit=0 didn't help. >>>>>>>>>>> >>>>>>>>>>> Here's the snippet of the crash and the subsequent remount by >>>>>>>>>>> monit. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] >>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>> [0x7f4402b99329] >>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In >>>>>>>>>>> valid argument] >>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>> 0-<SNIP>_data1-replicate-0: >>>>>>>>>>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times >>>>>>>>>>> between >>>>>>>>>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604] >>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to >>>>>>>>>>> dispatch >>>>>>>>>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and >>>>>>>>>>> [2019-02-08 01:13:09.311554] >>>>>>>>>>> pending frames: >>>>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>> signal received: 6 >>>>>>>>>>> time of crash: >>>>>>>>>>> 2019-02-08 01:13:09 >>>>>>>>>>> configuration details: >>>>>>>>>>> argp 1 >>>>>>>>>>> backtrace 1 >>>>>>>>>>> dlfcn 1 >>>>>>>>>>> libpthread 1 >>>>>>>>>>> llistxattr 1 >>>>>>>>>>> setfsid 1 >>>>>>>>>>> spinlock 1 >>>>>>>>>>> epoll.h 1 >>>>>>>>>>> xattr.h 1 >>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6] >>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160] >>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0] >>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1] >>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa] >>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772] >>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f] >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820] >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063] >>>>>>>>>>> >>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2] >>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3] >>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559] >>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f] >>>>>>>>>>> --------- >>>>>>>>>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] >>>>>>>>>>> [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running >>>>>>>>>>> /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs >>>>>>>>>>> --lru-limit=0 >>>>>>>>>>> --process-name fuse --volfile-server=localhost >>>>>>>>>>> --volfile-id=/<SNIP>_data1 >>>>>>>>>>> /mnt/<SNIP>_data1) >>>>>>>>>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190] >>>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started >>>>>>>>>>> thread >>>>>>>>>>> with index 1 >>>>>>>>>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190] >>>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started >>>>>>>>>>> thread >>>>>>>>>>> with index 2 >>>>>>>>>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190] >>>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started >>>>>>>>>>> thread >>>>>>>>>>> with index 3 >>>>>>>>>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190] >>>>>>>>>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started >>>>>>>>>>> thread >>>>>>>>>>> with index 4 >>>>>>>>>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] >>>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-0: parent translators >>>>>>>>>>> are >>>>>>>>>>> ready, attempting connect on transport >>>>>>>>>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] >>>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-1: parent translators >>>>>>>>>>> are >>>>>>>>>>> ready, attempting connect on transport >>>>>>>>>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] >>>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-2: parent translators >>>>>>>>>>> are >>>>>>>>>>> ready, attempting connect on transport >>>>>>>>>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] >>>>>>>>>>> [client.c:2354:notify] 0-<SNIP>_data1-client-3: parent translators >>>>>>>>>>> are >>>>>>>>>>> ready, attempting connect on transport >>>>>>>>>>> [2019-02-08 01:13:35.655527] I >>>>>>>>>>> [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-<SNIP>_data1-client-0: >>>>>>>>>>> changing port >>>>>>>>>>> to 49153 (from 0) >>>>>>>>>>> Final graph: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Sincerely, >>>>>>>>>>> Artem >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I've added the lru-limit=0 parameter to the mounts, and I see >>>>>>>>>>>> it's taken effect correctly: >>>>>>>>>>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse >>>>>>>>>>>> --volfile-server=localhost --volfile-id=/<SNIP> /mnt/<SNIP>" >>>>>>>>>>>> >>>>>>>>>>>> Let's see if it stops crashing or not. >>>>>>>>>>>> >>>>>>>>>>>> Sincerely, >>>>>>>>>>>> Artem >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Nithya, >>>>>>>>>>>>> >>>>>>>>>>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started >>>>>>>>>>>>> seeing crashes, and no further releases have been made yet. >>>>>>>>>>>>> >>>>>>>>>>>>> volume info: >>>>>>>>>>>>> Type: Replicate >>>>>>>>>>>>> Volume ID: ****SNIP**** >>>>>>>>>>>>> Status: Started >>>>>>>>>>>>> Snapshot Count: 0 >>>>>>>>>>>>> Number of Bricks: 1 x 4 = 4 >>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>> Bricks: >>>>>>>>>>>>> Brick1: ****SNIP**** >>>>>>>>>>>>> Brick2: ****SNIP**** >>>>>>>>>>>>> Brick3: ****SNIP**** >>>>>>>>>>>>> Brick4: ****SNIP**** >>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>> cluster.quorum-count: 1 >>>>>>>>>>>>> cluster.quorum-type: fixed >>>>>>>>>>>>> network.ping-timeout: 5 >>>>>>>>>>>>> network.remote-dio: enable >>>>>>>>>>>>> performance.rda-cache-limit: 256MB >>>>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>>>> performance.parallel-readdir: on >>>>>>>>>>>>> network.inode-lru-limit: 500000 >>>>>>>>>>>>> performance.md-cache-timeout: 600 >>>>>>>>>>>>> performance.cache-invalidation: on >>>>>>>>>>>>> performance.stat-prefetch: on >>>>>>>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>>>>>>> features.cache-invalidation: on >>>>>>>>>>>>> cluster.readdir-optimize: on >>>>>>>>>>>>> performance.io-thread-count: 32 >>>>>>>>>>>>> server.event-threads: 4 >>>>>>>>>>>>> client.event-threads: 4 >>>>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>>>> cluster.lookup-optimize: on >>>>>>>>>>>>> performance.cache-size: 1GB >>>>>>>>>>>>> cluster.self-heal-daemon: enable >>>>>>>>>>>>> transport.address-family: inet >>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>> performance.client-io-threads: on >>>>>>>>>>>>> cluster.granular-entry-heal: enable >>>>>>>>>>>>> cluster.data-self-heal-algorithm: full >>>>>>>>>>>>> >>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>> Artem >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Artem, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you still see the crashes with 5.3? If yes, please try >>>>>>>>>>>>>> mount the volume using the mount option lru-limit=0 and see if >>>>>>>>>>>>>> that helps. >>>>>>>>>>>>>> We are looking into the crashes and will update when have a fix. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, please provide the gluster volume info for the volume >>>>>>>>>>>>>> in question. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> regards, >>>>>>>>>>>>>> Nithya >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The fuse crash happened two more times, but this time monit >>>>>>>>>>>>>>> helped recover within 1 minute, so it's a great workaround for >>>>>>>>>>>>>>> now. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What's odd is that the crashes are only happening on one of >>>>>>>>>>>>>>> 4 servers, and I don't know why. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The fuse crash happened again yesterday, to another volume. >>>>>>>>>>>>>>>> Are there any mount options that could help mitigate this? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In the meantime, I set up a monit ( >>>>>>>>>>>>>>>> https://mmonit.com/monit/) task to watch and restart the >>>>>>>>>>>>>>>> mount, which works and recovers the mount point within a >>>>>>>>>>>>>>>> minute. Not ideal, >>>>>>>>>>>>>>>> but a temporary workaround. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> By the way, the way to reproduce this "Transport endpoint >>>>>>>>>>>>>>>> is not connected" condition for testing purposes is to kill -9 >>>>>>>>>>>>>>>> the right >>>>>>>>>>>>>>>> "glusterfs --process-name fuse" process. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> monit check: >>>>>>>>>>>>>>>> check filesystem glusterfs_data1 with path >>>>>>>>>>>>>>>> /mnt/glusterfs_data1 >>>>>>>>>>>>>>>> start program = "/bin/mount /mnt/glusterfs_data1" >>>>>>>>>>>>>>>> stop program = "/bin/umount /mnt/glusterfs_data1" >>>>>>>>>>>>>>>> if space usage > 90% for 5 times within 15 cycles >>>>>>>>>>>>>>>> then alert else if succeeded for 10 cycles then alert >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> stack trace: >>>>>>>>>>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>> [0x7fa0249e4329] >>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: >>>>>>>>>>>>>>>> Failed to dispatch >>>>>>>>>>>>>>>> handler" repeated 26 times between [2019-02-01 >>>>>>>>>>>>>>>> 23:21:20.857333] and >>>>>>>>>>>>>>>> [2019-02-01 23:21:56.164427] >>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>>>>> 0-SITE_data3-replicate-0: >>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 >>>>>>>>>>>>>>>> times between >>>>>>>>>>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036] >>>>>>>>>>>>>>>> pending frames: >>>>>>>>>>>>>>>> frame : type(1) op(LOOKUP) >>>>>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>>>>> time of crash: >>>>>>>>>>>>>>>> 2019-02-01 23:22:03 >>>>>>>>>>>>>>>> configuration details: >>>>>>>>>>>>>>>> argp 1 >>>>>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6] >>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160] >>>>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0] >>>>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1] >>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa] >>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f] >>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820] >>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2] >>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3] >>>>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559] >>>>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The first (and so far only) crash happened at 2am the next >>>>>>>>>>>>>>>>> day after we upgraded, on only one of four servers and only >>>>>>>>>>>>>>>>> to one of two >>>>>>>>>>>>>>>>> mounts. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have no idea what caused it, but yeah, we do have a >>>>>>>>>>>>>>>>> pretty busy site (apkmirror.com), and it caused a >>>>>>>>>>>>>>>>> disruption for any uploads or downloads from that server >>>>>>>>>>>>>>>>> until I woke up >>>>>>>>>>>>>>>>> and fixed the mount. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I wish I could be more helpful but all I have is that >>>>>>>>>>>>>>>>> stack trace. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm glad it's a blocker and will hopefully be resolved >>>>>>>>>>>>>>>>> soon. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Artem, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Opened >>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, >>>>>>>>>>>>>>>>>> as a clone of other bugs where recent discussions happened), >>>>>>>>>>>>>>>>>> and marked it >>>>>>>>>>>>>>>>>> as a blocker for glusterfs-5.4 release. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We already have fixes for log flooding - >>>>>>>>>>>>>>>>>> https://review.gluster.org/22128, and are the process of >>>>>>>>>>>>>>>>>> identifying and fixing the issue seen with crash. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Can you please tell if the crashes happened as soon as >>>>>>>>>>>>>>>>>> upgrade ? or was there any particular pattern you observed >>>>>>>>>>>>>>>>>> before the crash. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Amar >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to >>>>>>>>>>>>>>>>>>> 5.3, I already got a crash which others have mentioned in >>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and >>>>>>>>>>>>>>>>>>> had to unmount, kill gluster, and remount: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] >>>>>>>>>>>>>>>>>>> -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] >>>>>>>>>>>>>>>>>>> -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] >>>>>>>>>>>>>>>>>>> -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>> [0x7fcccafcd329] >>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>> [0x7fcccb1deaf5] >>>>>>>>>>>>>>>>>>> -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument] >>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>>>>>>>> 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 >>>>>>>>>>>>>>>>>>> times between >>>>>>>>>>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 >>>>>>>>>>>>>>>>>>> 09:38:03.958061] >>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: >>>>>>>>>>>>>>>>>>> Failed to dispatch >>>>>>>>>>>>>>>>>>> handler" repeated 72 times between [2019-01-31 >>>>>>>>>>>>>>>>>>> 09:37:53.746741] and >>>>>>>>>>>>>>>>>>> [2019-01-31 09:38:04.696993] >>>>>>>>>>>>>>>>>>> pending frames: >>>>>>>>>>>>>>>>>>> frame : type(1) op(READ) >>>>>>>>>>>>>>>>>>> frame : type(1) op(OPEN) >>>>>>>>>>>>>>>>>>> frame : type(0) op(0) >>>>>>>>>>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>>>>>>>>>>>> signal received: 6 >>>>>>>>>>>>>>>>>>> time of crash: >>>>>>>>>>>>>>>>>>> 2019-01-31 09:38:04 >>>>>>>>>>>>>>>>>>> configuration details: >>>>>>>>>>>>>>>>>>> argp 1 >>>>>>>>>>>>>>>>>>> backtrace 1 >>>>>>>>>>>>>>>>>>> dlfcn 1 >>>>>>>>>>>>>>>>>>> libpthread 1 >>>>>>>>>>>>>>>>>>> llistxattr 1 >>>>>>>>>>>>>>>>>>> setfsid 1 >>>>>>>>>>>>>>>>>>> spinlock 1 >>>>>>>>>>>>>>>>>>> epoll.h 1 >>>>>>>>>>>>>>>>>>> xattr.h 1 >>>>>>>>>>>>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>>>>>>>>>>>> package-string: glusterfs 5.3 >>>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6] >>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160] >>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0] >>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1] >>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa] >>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778] >>>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820] >>>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2] >>>>>>>>>>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3] >>>>>>>>>>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559] >>>>>>>>>>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f] >>>>>>>>>>>>>>>>>>> --------- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Do the pending patches fix the crash or only the >>>>>>>>>>>>>>>>>>> repeated warnings? I'm running glusterfs on OpenSUSE 15.0 >>>>>>>>>>>>>>>>>>> installed via >>>>>>>>>>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, >>>>>>>>>>>>>>>>>>> not too sure how to make it core dump. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If it's not fixed by the patches above, has anyone >>>>>>>>>>>>>>>>>>> already opened a ticket for the crashes that I can join and >>>>>>>>>>>>>>>>>>> monitor? This >>>>>>>>>>>>>>>>>>> is going to create a massive problem for us since >>>>>>>>>>>>>>>>>>> production systems are >>>>>>>>>>>>>>>>>>> crashing. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK >>>>>>>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of >>>>>>>>>>>>>>>>>>>>> these "Failed to dispatch handler" in my logs as well. >>>>>>>>>>>>>>>>>>>>> Many people have >>>>>>>>>>>>>>>>>>>>> been commenting about this issue here >>>>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ >>>>>>>>>>>>>>>>>>>> addresses this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>>>>> [0x7fd9671deaf5] >>>>>>>>>>>>>>>>>>>>>> -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid >>>>>>>>>>>>>>>>>>>>>> argument] >>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <== >>>>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: >>>>>>>>>>>>>>>>>>>>>> Failed to dispatch >>>>>>>>>>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 >>>>>>>>>>>>>>>>>>>>>> 20:36:23.881090] and >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.015593] >>>>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>>>>>>>>>>> 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated >>>>>>>>>>>>>>>>>>>>>> 42 times between >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 >>>>>>>>>>>>>>>>>>>>>> 20:38:20.280306] >>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>>>>>>>>>>>> The message "I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>>>>>>>>>>> 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated >>>>>>>>>>>>>>>>>>>>>> 50 times between >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 >>>>>>>>>>>>>>>>>>>>>> 20:38:19.459789] >>>>>>>>>>>>>>>>>>>>>> The message "E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: >>>>>>>>>>>>>>>>>>>>>> Failed to dispatch >>>>>>>>>>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 >>>>>>>>>>>>>>>>>>>>>> 20:36:22.667327] and >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:20.546355] >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>>>>>>>>>>> 2-SITE_data1-replicate-0: >>>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0 >>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data3.log <== >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031] >>>>>>>>>>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] >>>>>>>>>>>>>>>>>>>>>> 2-SITE_data3-replicate-0: >>>>>>>>>>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0 >>>>>>>>>>>>>>>>>>>>>> ==> mnt-SITE_data1.log <== >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191] >>>>>>>>>>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: >>>>>>>>>>>>>>>>>>>>>> Failed to dispatch >>>>>>>>>>>>>>>>>>>>>> handler >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list >>>>>>>>>>>>>>>>>>>>> may bring some additional eyeballs and get them both >>>>>>>>>>>>>>>>>>>>> fixed. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com> >>>>>>>>>>>>>>>>>>>>> , APK Mirror <http://www.apkmirror.com/>, Illogical >>>>>>>>>>>>>>>>>>>>> Robot LLC >>>>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>>>>>>>>>>>>>>>>>>>> <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I found a similar issue here: >>>>>>>>>>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. >>>>>>>>>>>>>>>>>>>>>> There's a comment from 3 days ago from someone else with >>>>>>>>>>>>>>>>>>>>>> 5.3 who started >>>>>>>>>>>>>>>>>>>>>> seeing the spam. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Here's the command that repeats over and over: >>>>>>>>>>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] >>>>>>>>>>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) >>>>>>>>>>>>>>>>>>>>>> [0x7fd966fcd329] >>>>>>>>>>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) >>>>>>>>>>>>>>>>>>>>>> [0x7fd9671deaf5] >>>>>>>>>>>>>>>>>>>>>> -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) >>>>>>>>>>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid >>>>>>>>>>>>>>>>>>>>>> argument] >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> +Milind Changire <[email protected]> Can you check >>>>>>>>>>>>>>>>>>>> why this message is logged and send a fix? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is there any fix for this issue? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Sincerely, >>>>>>>>>>>>>>>>>>>>>> Artem >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> Founder, Android Police >>>>>>>>>>>>>>>>>>>>>> <http://www.androidpolice.com>, APK Mirror >>>>>>>>>>>>>>>>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC >>>>>>>>>>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii >>>>>>>>>>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | >>>>>>>>>>>>>>>>>>>>>> @ArtemR <http://twitter.com/ArtemR> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Amar Tumballi (amarts) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Gluster-users mailing list >>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>> Gluster-users mailing list >>>>>>>>> [email protected] >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> [email protected] >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> https://lists.gluster.org/mailman/listinfo/gluster-users > >
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
