I'm having a frequent problem where some temporary condition causes bricks to 
be shut down. The health-check feature is shutting them down, and according to 
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/brick-failure-detection/
 the brick will stay off and not be restarted (by design).

 

What I don't understand is:
What is causing this "Resource temporarily unavailable" in the first place. 
From searching the web, it sounds like a socket timeout. Have you guys seen 
this before?
If this is truly a temporary failure, why do we shut down the brick 
indefinitely?
 

Should I try any of the following:
Increase 'network.ping-timeout' or 'client.grace-timeout'
Disable the health check feature by setting:
# gluster volume set <VOLNAME> storage.health-check-interval 0

 

The brick log looks like this at the time it is shut down:

------------------

[2019-05-08 13:48:33.642605] W [MSGID: 113075] 
[posix-helpers.c:1895:posix_fs_health_check] 0-heketidbstorage-posix: 
aio_write() on 
/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick/.glusterfs/health_check
 returned [Resource temporarily unavailable]

[2019-05-08 13:48:33.749246] M [MSGID: 113075] 
[posix-helpers.c:1962:posix_health_check_thread_proc] 0-heketidbstorage-posix: 
health-check failed, going down

[2019-05-08 13:48:34.000428] M [MSGID: 113075] 
[posix-helpers.c:1981:posix_health_check_thread_proc] 0-heketidbstorage-posix: 
still alive! -> SIGTERM

[2019-05-08 13:49:04.597061] W [glusterfsd.c:1514:cleanup_and_exit] 
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f16fdd94dd5] 
-->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556e53da2d65] 
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556e53da2b8b] ) 0-: received 
signum (15), shutting down

------------------

 

The GlusterD log shows this shortly after:

 

------------------
[2019-05-08 13:49:04.673536] I [MSGID: 106143] 
[glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick 
/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick
 on port
 49152
[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv 
on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)
------------------

 

Any guidance would be greatly appreciated!

 

Best,

 

Jeff Bischoff

 

 

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to