libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Artem Russakovskii Fri, 08 Feb 2019 11:23:47 -0800

Hi Nithya,

I can try to disable write-behind as long as it doesn't heavily impact
performance for us. Which option is it exactly? I don't see it set in my
list of changed volume variables that I sent you guys earlier.


Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>


On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <[email protected]>
wrote:

> Hi Artem,
>
> We have found the cause of one crash. Unfortunately we have not managed to
> reproduce the one you reported so we don't know if it is the same cause.
>
> Can you disable write-behind on the volume and let us know if it solves
> the problem? If yes, it is likely to be the same issue.
>
>
> regards,
> Nithya
>
> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <[email protected]>
> wrote:
>
>> Sorry to disappoint, but the crash just happened again, so lru-limit=0
>> didn't help.
>>
>> Here's the snippet of the crash and the subsequent remount by monit.
>>
>>
>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7f4402b99329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>> valid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0:
>> selecting local read_child <SNIP>_data1-client-3" repeated 39 times between
>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>> [2019-02-08 01:13:09.311554]
>> pending frames:
>> frame : type(1) op(LOOKUP)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-02-08 01:13:09
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>
>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>> ---------
>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>> --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1)
>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 1
>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 2
>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 3
>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 4
>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
>> 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
>> 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
>> 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
>> 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
>> 0-<SNIP>_data1-client-0: changing port to 49153 (from 0)
>> Final graph:
>>
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>>
>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <[email protected]>
>> wrote:
>>
>>> I've added the lru-limit=0 parameter to the mounts, and I see it's taken
>>> effect correctly:
>>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>>> --volfile-server=localhost --volfile-id=/<SNIP>  /mnt/<SNIP>"
>>>
>>> Let's see if it stops crashing or not.
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>>
>>> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <[email protected]>
>>> wrote:
>>>
>>>> Hi Nithya,
>>>>
>>>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing
>>>> crashes, and no further releases have been made yet.
>>>>
>>>> volume info:
>>>> Type: Replicate
>>>> Volume ID: ****SNIP****
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 4 = 4
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: ****SNIP****
>>>> Brick2: ****SNIP****
>>>> Brick3: ****SNIP****
>>>> Brick4: ****SNIP****
>>>> Options Reconfigured:
>>>> cluster.quorum-count: 1
>>>> cluster.quorum-type: fixed
>>>> network.ping-timeout: 5
>>>> network.remote-dio: enable
>>>> performance.rda-cache-limit: 256MB
>>>> performance.readdir-ahead: on
>>>> performance.parallel-readdir: on
>>>> network.inode-lru-limit: 500000
>>>> performance.md-cache-timeout: 600
>>>> performance.cache-invalidation: on
>>>> performance.stat-prefetch: on
>>>> features.cache-invalidation-timeout: 600
>>>> features.cache-invalidation: on
>>>> cluster.readdir-optimize: on
>>>> performance.io-thread-count: 32
>>>> server.event-threads: 4
>>>> client.event-threads: 4
>>>> performance.read-ahead: off
>>>> cluster.lookup-optimize: on
>>>> performance.cache-size: 1GB
>>>> cluster.self-heal-daemon: enable
>>>> transport.address-family: inet
>>>> nfs.disable: on
>>>> performance.client-io-threads: on
>>>> cluster.granular-entry-heal: enable
>>>> cluster.data-self-heal-algorithm: full
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Artem,
>>>>>
>>>>> Do you still see the crashes with 5.3? If yes, please try mount the
>>>>> volume using the mount option lru-limit=0 and see if that helps. We are
>>>>> looking into the crashes and will update when have a fix.
>>>>>
>>>>> Also, please provide the gluster volume info for the volume in
>>>>> question.
>>>>>
>>>>>
>>>>> regards,
>>>>> Nithya
>>>>>
>>>>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The fuse crash happened two more times, but this time monit helped
>>>>>> recover within 1 minute, so it's a great workaround for now.
>>>>>>
>>>>>> What's odd is that the crashes are only happening on one of 4
>>>>>> servers, and I don't know why.
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>>
>>>>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> The fuse crash happened again yesterday, to another volume. Are
>>>>>>> there any mount options that could help mitigate this?
>>>>>>>
>>>>>>> In the meantime, I set up a monit (https://mmonit.com/monit/) task
>>>>>>> to watch and restart the mount, which works and recovers the mount point
>>>>>>> within a minute. Not ideal, but a temporary workaround.
>>>>>>>
>>>>>>> By the way, the way to reproduce this "Transport endpoint is not
>>>>>>> connected" condition for testing purposes is to kill -9 the right
>>>>>>> "glusterfs --process-name fuse" process.
>>>>>>>
>>>>>>>
>>>>>>> monit check:
>>>>>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>>>>>>>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>>>>>>>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>>>>>>>   if space usage > 90% for 5 times within 15 cycles
>>>>>>>     then alert else if succeeded for 10 cycles then alert
>>>>>>>
>>>>>>>
>>>>>>> stack trace:
>>>>>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>> [0x7fa0249e4329]
>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>>>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>> [0x7fa0249e4329]
>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>>>>>> The message "E [MSGID: 101191]
>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
>>>>>>> dispatch
>>>>>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
>>>>>>> [2019-02-01 23:21:56.164427]
>>>>>>> The message "I [MSGID: 108031]
>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>>>>>> selecting local read_child SITE_data3-client-3" repeated 27 times 
>>>>>>> between
>>>>>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
>>>>>>> pending frames:
>>>>>>> frame : type(1) op(LOOKUP)
>>>>>>> frame : type(0) op(0)
>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>> signal received: 6
>>>>>>> time of crash:
>>>>>>> 2019-02-01 23:22:03
>>>>>>> configuration details:
>>>>>>> argp 1
>>>>>>> backtrace 1
>>>>>>> dlfcn 1
>>>>>>> libpthread 1
>>>>>>> llistxattr 1
>>>>>>> setfsid 1
>>>>>>> spinlock 1
>>>>>>> epoll.h 1
>>>>>>> xattr.h 1
>>>>>>> st_atim.tv_nsec 1
>>>>>>> package-string: glusterfs 5.3
>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The first (and so far only) crash happened at 2am the next day
>>>>>>>> after we upgraded, on only one of four servers and only to one of two
>>>>>>>> mounts.
>>>>>>>>
>>>>>>>> I have no idea what caused it, but yeah, we do have a pretty busy
>>>>>>>> site (apkmirror.com), and it caused a disruption for any uploads
>>>>>>>> or downloads from that server until I woke up and fixed the mount.
>>>>>>>>
>>>>>>>> I wish I could be more helpful but all I have is that stack trace.
>>>>>>>>
>>>>>>>> I'm glad it's a blocker and will hopefully be resolved soon.
>>>>>>>>
>>>>>>>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Artem,
>>>>>>>>>
>>>>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie,
>>>>>>>>> as a clone of other bugs where recent discussions happened), and 
>>>>>>>>> marked it
>>>>>>>>> as a blocker for glusterfs-5.4 release.
>>>>>>>>>
>>>>>>>>> We already have fixes for log flooding -
>>>>>>>>> https://review.gluster.org/22128, and are the process of
>>>>>>>>> identifying and fixing the issue seen with crash.
>>>>>>>>>
>>>>>>>>> Can you please tell if the crashes happened as soon as upgrade ?
>>>>>>>>> or was there any particular pattern you observed before the crash.
>>>>>>>>>
>>>>>>>>> -Amar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I
>>>>>>>>>> already got a crash which others have mentioned in
>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to
>>>>>>>>>> unmount, kill gluster, and remount:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>> [0x7fcccafcd329]
>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>>>>>>> selecting local read_child SITE_data1-client-3" repeated 5 times 
>>>>>>>>>> between
>>>>>>>>>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to 
>>>>>>>>>> dispatch
>>>>>>>>>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
>>>>>>>>>> [2019-01-31 09:38:04.696993]
>>>>>>>>>> pending frames:
>>>>>>>>>> frame : type(1) op(READ)
>>>>>>>>>> frame : type(1) op(OPEN)
>>>>>>>>>> frame : type(0) op(0)
>>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>>> signal received: 6
>>>>>>>>>> time of crash:
>>>>>>>>>> 2019-01-31 09:38:04
>>>>>>>>>> configuration details:
>>>>>>>>>> argp 1
>>>>>>>>>> backtrace 1
>>>>>>>>>> dlfcn 1
>>>>>>>>>> libpthread 1
>>>>>>>>>> llistxattr 1
>>>>>>>>>> setfsid 1
>>>>>>>>>> spinlock 1
>>>>>>>>>> epoll.h 1
>>>>>>>>>> xattr.h 1
>>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>> package-string: glusterfs 5.3
>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>>>>>>>>>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>>>>>>>>>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>>>>>>>>>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>>>>>>>>>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>>>>>>>>>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>>>>>>>>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>>>>>>>>>
>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>>>>>>>>>
>>>>>>>>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>>>>>>>>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>>>>>>>>>>
>>>>>>>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>>>>>>>>>>
>>>>>>>>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>>>>>>>>>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>>>>>>>>>> ---------
>>>>>>>>>>
>>>>>>>>>> Do the pending patches fix the crash or only the repeated
>>>>>>>>>> warnings? I'm running glusterfs on OpenSUSE 15.0 installed via
>>>>>>>>>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>>>>>>>>>> not too sure how to make it core dump.
>>>>>>>>>>
>>>>>>>>>> If it's not fixed by the patches above, has anyone already opened
>>>>>>>>>> a ticket for the crashes that I can join and monitor? This is going 
>>>>>>>>>> to
>>>>>>>>>> create a massive problem for us since production systems are 
>>>>>>>>>> crashing.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Sincerely,
>>>>>>>>>> Artem
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Also, not sure if related or not, but I got a ton of these
>>>>>>>>>>>> "Failed to dispatch handler" in my logs as well. Many people have 
>>>>>>>>>>>> been
>>>>>>>>>>>> commenting about this issue here
>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses
>>>>>>>>>>> this.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>> ==> mnt-SITE_data3.log <==
>>>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed 
>>>>>>>>>>>>> to dispatch
>>>>>>>>>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> [2019-01-30 20:38:20.015593]
>>>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 
>>>>>>>>>>>>> 2-SITE_data3-replicate-0:
>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times 
>>>>>>>>>>>>> between
>>>>>>>>>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>>> The message "I [MSGID: 108031]
>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 
>>>>>>>>>>>>> 2-SITE_data1-replicate-0:
>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times 
>>>>>>>>>>>>> between
>>>>>>>>>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>>>>>>>>>> The message "E [MSGID: 101191]
>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed 
>>>>>>>>>>>>> to dispatch
>>>>>>>>>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> [2019-01-30 20:38:20.546355]
>>>>>>>>>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031]
>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 
>>>>>>>>>>>>> 2-SITE_data1-replicate-0:
>>>>>>>>>>>>> selecting local read_child SITE_data1-client-0
>>>>>>>>>>>>> ==> mnt-SITE_data3.log <==
>>>>>>>>>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>>>>>>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 
>>>>>>>>>>>>> 2-SITE_data3-replicate-0:
>>>>>>>>>>>>> selecting local read_child SITE_data3-client-0
>>>>>>>>>>>>> ==> mnt-SITE_data1.log <==
>>>>>>>>>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>>>>>>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed 
>>>>>>>>>>>>> to dispatch
>>>>>>>>>>>>> handler
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'm hoping raising the issue here on the mailing list may bring
>>>>>>>>>>>> some additional eyeballs and get them both fixed.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>> Artem
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I found a similar issue here:
>>>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's
>>>>>>>>>>>>> a comment from 3 days ago from someone else with 5.3 who started 
>>>>>>>>>>>>> seeing the
>>>>>>>>>>>>> spam.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's the command that repeats over and over:
>>>>>>>>>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
>>>>>>>>>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>>>>>>>>>> [0x7fd966fcd329]
>>>>>>>>>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>>>>>>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>>>>>>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> +Milind Changire <[email protected]> Can you check why this
>>>>>>>>>>> message is logged and send a fix?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Is there any fix for this issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sincerely,
>>>>>>>>>>>>> Artem
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Founder, Android Police <http://www.androidpolice.com>, APK
>>>>>>>>>>>>> Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Amar Tumballi (amarts)
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> [email protected]
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to