Re: [Gluster-users] GlusterFS cluster stalls if one server from the cluster goes down and then comes back up

Marian Marinov Wed, 23 Mar 2016 14:40:57 -0700

On 03/23/2016 05:52 PM, Ravishankar N wrote:
> On 03/23/2016 02:01 PM, Daniel Kanchev wrote:
>> Hi, everyone.
>>
>> We are using GlusterFS configured in the following way:
>>
>> [root@web1 ~]# gluster volume info
>>  
>> Volume Name: share
>> Type: Replicate
>> Volume ID: hidden data on purpose
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: c10839:/gluster
>> Brick2: c10840:/gluster
>> Brick3: web3:/gluster
>> Options Reconfigured:
>> cluster.consistent-metadata: on
>> performance.readdir-ahead: on
>> nfs.disable: true
>> cluster.self-heal-daemon: on
>> cluster.metadata-self-heal: on
>> auth.allow: hidden data on purpose
>> performance.cache-size: 256MB
>> performance.io-thread-count: 8
>> performance.cache-refresh-timeout: 3
>>
>> Here is the output of the status command for the volume and the peers:
>>
>> [root@web1 ~]# gluster volume status
>> Status of volume: share
>> Gluster process                             TCP Port  RDMA Port  Online  Pid
>> ------------------------------------------------------------------------------
>> Brick c10839:/gluster                       49152     0          Y       540 
>> Brick c10840:/gluster                       49152     0          Y       533 
>> Brick web3:/gluster                         49152     0          Y       782 
>> Self-heal Daemon on localhost               N/A       N/A        Y       602 
>> Self-heal Daemon on web3                    N/A       N/A        Y       790 
>> Self-heal Daemon on web4                    N/A       N/A        Y       636 
>> Self-heal Daemon on web2                    N/A       N/A        Y       523 
>>  
>> Task Status of Volume share
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> [root@web1 ~]# gluster peer status
>> Number of Peers: 3
>>
>> Hostname: web3
>> Uuid: b138b4d5-8623-4224-825e-1dfdc3770743
>> State: Peer in Cluster (Connected)
>>
>> Hostname: web2
>> Uuid: b3926959-3ae8-4826-933a-4bf3b3bd55aa
>> State: Peer in Cluster (Connected)
>> Other names:
>> c10840.sgvps.net <http://c10840.sgvps.net>
>>
>> Hostname: web4
>> Uuid: f7553cba-c105-4d2c-8b89-e5e78a269847
>> State: Peer in Cluster (Connected)
>>
>> All in all, we have three servers that are servers and actually store the 
>> data and one server which is just a peer and is connected to one of the 
>> other servers.
>> *
>> *
>> *The Problem*: If any of the 4 servers goes down then the cluster continues 
>> to work as expected. However, once this server comes back up then the whole 
>> cluster stalls for a certain period of time (30-120 seconds). During this 
>> period no I/O
>> operations could be executed and the apps that use the data on the GlusterFS 
>> simply go down because they cannot read/write any data.
>>
>> We suspect that the issue is related to the self-heal daemons but we are not 
>> sure. Could you please advice how to debug this issue and what could be 
>> causing the whole cluster to go down. If it is the self-heal as we suspect 
>> do you think it is ok to
>> disable it. If some of the settings are causing this problem could you 
>> please advice how to configure the cluster to avoid this problem.
>>
> 
> What version of gluster is this?


3.7.6

> Do you observe the problem even when only the 4th 'non data' server comes up? 
> In that case it is unlikely that self-heal is the issue.

No

> Are the clients using FUSE or NFS mounts?

FUSE

> -Ravi
>> If any info from the logs is requested please let us know what do you need.
>>
>> Thanks in advance!
>>
>> Regards,
>> Daniel
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://www.gluster.org/mailman/listinfo/gluster-users
> 
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS cluster stalls if one server from the cluster goes down and then comes back up

Reply via email to