Re: [Gluster-users] GlusterFS cluster stalls if one server from the cluster goes down and then comes back up

Ravishankar N Wed, 23 Mar 2016 08:55:32 -0700

On 03/23/2016 02:01 PM, Daniel Kanchev wrote:

Hi, everyone.
We are using GlusterFS configured in the following way:

[root@web1 ~]# gluster volume info

Volume Name: share
Type: Replicate
Volume ID: hidden data on purpose
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: c10839:/gluster
Brick2: c10840:/gluster
Brick3: web3:/gluster
Options Reconfigured:
cluster.consistent-metadata: on
performance.readdir-ahead: on
nfs.disable: true
cluster.self-heal-daemon: on
cluster.metadata-self-heal: on
auth.allow: hidden data on purpose
performance.cache-size: 256MB
performance.io-thread-count: 8
performance.cache-refresh-timeout: 3

Here is the output of the status command for the volume and the peers:

[root@web1 ~]# gluster volume status
Status of volume: share
Gluster process TCP Port RDMA PortOnline Pid
------------------------------------------------------------------------------
Brick c10839:/gluster                       49152 0          Y       540
Brick c10840:/gluster                       49152 0          Y       533
Brick web3:/gluster                         49152 0          Y       782
Self-heal Daemon on localhost               N/A N/A        Y       602
Self-heal Daemon on web3                    N/A N/A        Y       790
Self-heal Daemon on web4                    N/A N/A        Y       636
Self-heal Daemon on web2                    N/A N/A        Y       523

Task Status of Volume share
------------------------------------------------------------------------------
There are no active volume tasks

[root@web1 ~]# gluster peer status
Number of Peers: 3

Hostname: web3
Uuid: b138b4d5-8623-4224-825e-1dfdc3770743
State: Peer in Cluster (Connected)

Hostname: web2
Uuid: b3926959-3ae8-4826-933a-4bf3b3bd55aa
State: Peer in Cluster (Connected)
Other names:
c10840.sgvps.net <http://c10840.sgvps.net>

Hostname: web4
Uuid: f7553cba-c105-4d2c-8b89-e5e78a269847
State: Peer in Cluster (Connected)
All in all, we have three servers that are servers and actually storethe data and one server which is just a peer and is connected to oneof the other servers.
*
*
*The Problem*: If any of the 4 servers goes down then the clustercontinues to work as expected. However, once this server comes back upthen the whole cluster stalls for a certain period of time (30-120seconds). During this period no I/O operations could be executed andthe apps that use the data on the GlusterFS simply go down becausethey cannot read/write any data.
We suspect that the issue is related to the self-heal daemons but weare not sure. Could you please advice how to debug this issue and whatcould be causing the whole cluster to go down. If it is the self-healas we suspect do you think it is ok to disable it. If some of thesettings are causing this problem could you please advice how toconfigure the cluster to avoid this problem.


What version of gluster is this?

Do you observe the problem even when only the 4th 'non data' servercomes up? In that case it is unlikely that self-heal is the issue.

Are the clients using FUSE or NFS mounts?
-Ravi

If any info from the logs is requested please let us know what do youneed.


Thanks in advance!

Regards,
Daniel


_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS cluster stalls if one server from the cluster goes down and then comes back up

Reply via email to