As per code, it is difficult to predict why the brick was crashed in this
function but I think we can avoid this crash after saving the
host_name/base_path
in the starting of this function(posix_fs_health_check). Though we do
cancel this health check thread at the time of calling posix_fini and call
this function(posix_fs_health_check)
between cancellation point so ideally hostname and base_path will be free
after calling this function (posix_fs_health_check) but here it seems
hostname/base_path are freed at
the time of calling gf_event.

Thanks,
Mohit Agrawal

On Wed, Nov 14, 2018 at 9:42 PM Nithya Balachandran <nbala...@redhat.com>
wrote:

> I am also seeing a bunch of these errors in the logs. Do they really need
> to be Info logs?
>
> <snip>
>
> [2018-11-14 06:01:18.597029] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 382 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.597640] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 385 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.598114] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 344 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.598737] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 343 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.599172] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 384 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.599789] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 386 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.600237] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 387 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.579405] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 324 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.563771] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 187 (errno:11:Resource temporarily
> unavailable); returning ENODATA
> [2018-11-14 06:01:18.560416] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 161 (errno:0:Success); returning
> ENODATA
> [2018-11-14 06:01:18.564826] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 194 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.561503] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 306 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.567985] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 164 (errno:0:Success); returning
> ENODATA
> [2018-11-14 06:01:18.566924] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 198 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.562567] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 346 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:18.580451] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 363 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:23.165797] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 346 (errno:11:Resource temporarily
> unavailable); returning ENODATA
> [2018-11-14 06:01:23.166019] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 207 (errno:0:Success); returning
> ENODATA
> [2018-11-14 06:01:23.166905] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 197 (errno:0:Success); returning
> ENODATA
> [2018-11-14 06:01:23.167541] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 273 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:23.167904] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 275 (errno:61:No data available);
> returning ENODATA
> [2018-11-14 06:01:23.168502] I [socket.c:693:__socket_rwv]
> 0-tcp.patchy-vol01-server: EOF on socket 277 (errno:61:No data available);
> returning ENODATA
>
> </snip>
>
>
> [nbalacha@dhcp35-62 bricks]$ grep "EOF on socket" d-backends-vol01-brick*
> |wc -l
> 1580
>
>
>
> Regards,
> Nithya
>
> On 14 November 2018 at 21:03, Shyam Ranganathan <srang...@redhat.com>
> wrote:
>
>> On 11/14/2018 10:04 AM, Nithya Balachandran wrote:
>> > Hi Mohit,
>> >
>> > The regression run in the subject has failed because a brick has
>> crashed in
>> >
>> > bug-1432542-mpx-restart-crash.t
>> >
>> >
>> > *06:03:38* 1 test(s) generated core
>> > *06:03:38* ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
>> > *06:03:38*
>> >
>> >
>> > The brick process has crashed in posix_fs_health_check as  this->priv
>> > contains garbage. It looks like it might have been freed already. Can
>> > you take a look at it?
>>
>> Sounds like another incarnation of:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1636570
>>
>> @mohit, any further clues?
>>
>> >
>> >
>> >
>> > (gdb) bt
>> > #0  0x00007f4019ea1f19 in vfprintf () from ./lib64/libc.so.6
>> > #1  0x00007f4019eccf49 in vsnprintf () from ./lib64/libc.so.6
>> > #2  0x00007f401b87705a in gf_vasprintf (string_ptr=0x7f3e81ff99f0,
>> > format=0x7f400df32f40 "op=%s;path=%s;error=%s;brick=%s:%s timeout is
>> > %d", arg=0x7f3e81ff99f8)
>> >     at
>> >
>> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/mem-pool.c:234
>> > #3  0x00007f401b8de6e2 in _gf_event
>> > (event=EVENT_POSIX_HEALTH_CHECK_FAILED, fmt=0x7f400df32f40
>> > "op=%s;path=%s;error=%s;brick=%s:%s timeout is %d")
>> >     at
>> >
>> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/events.c:89
>> > #4  0x00007f400def07f9 in posix_fs_health_check (this=0x7f3fd78b7840) at
>> >
>> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:1960
>> > #5  0x00007f400def0926 in posix_health_check_thread_proc
>> > (data=0x7f3fd78b7840)
>> >     at
>> >
>> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:2005
>> > #6  0x00007f401a68ae25 in start_thread () from ./lib64/libpthread.so.0
>> > #7  0x00007f4019f53bad in clone () from ./lib64/libc.so.6
>> > (gdb) f 4
>> > #4  0x00007f400def07f9 in posix_fs_health_check (this=0x7f3fd78b7840) at
>> >
>> /home/jenkins/root/workspace/centos7-regression/xlators/storage/posix/src/posix-helpers.c:1960
>> > 1960        gf_event(EVENT_POSIX_HEALTH_CHECK_FAILED,
>> > (gdb) l
>> > 1955        sys_close(fd);
>> > 1956    }
>> > 1957    if (ret && file_path[0]) {
>> > 1958        gf_msg(this->name, GF_LOG_WARNING, errno,
>> > P_MSG_HEALTHCHECK_FAILED,
>> > 1959               "%s() on %s returned", op, file_path);
>> > 1960        gf_event(EVENT_POSIX_HEALTH_CHECK_FAILED,
>> > 1961                 "op=%s;path=%s;error=%s;brick=%s:%s timeout is
>> %d", op,
>> > 1962                 file_path, strerror(op_errno), priv->hostname,
>> > priv->base_path,
>> > 1963                 timeout);
>> > 1964    }
>> > (gdb) p pri->hostname
>> > No symbol "pri" in current context.
>> > *(gdb) p priv->hostname*
>> > *$14 = 0xa200 <error: Cannot access memory at address 0xa200>*
>> > *(gdb) p priv->base_path*
>> > *$15 = 0x7f3ddeadc0de00 <error: Cannot access memory at address
>> > 0x7f3ddeadc0de00>*
>> > (gdb)
>> >
>> >
>> >
>> > Thanks,
>> > Nithya
>> >
>> >
>> > _______________________________________________
>> > Gluster-devel mailing list
>> > Gluster-devel@gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-devel
>> >
>>
>
>
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Reply via email to