On 03/23/2016 05:52 PM, Ravishankar N wrote: > On 03/23/2016 02:01 PM, Daniel Kanchev wrote: >> Hi, everyone. >> >> We are using GlusterFS configured in the following way: >> >> [root@web1 ~]# gluster volume info >> >> Volume Name: share >> Type: Replicate >> Volume ID: hidden data on purpose >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: c10839:/gluster >> Brick2: c10840:/gluster >> Brick3: web3:/gluster >> Options Reconfigured: >> cluster.consistent-metadata: on >> performance.readdir-ahead: on >> nfs.disable: true >> cluster.self-heal-daemon: on >> cluster.metadata-self-heal: on >> auth.allow: hidden data on purpose >> performance.cache-size: 256MB >> performance.io-thread-count: 8 >> performance.cache-refresh-timeout: 3 >> >> Here is the output of the status command for the volume and the peers: >> >> [root@web1 ~]# gluster volume status >> Status of volume: share >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick c10839:/gluster 49152 0 Y 540 >> Brick c10840:/gluster 49152 0 Y 533 >> Brick web3:/gluster 49152 0 Y 782 >> Self-heal Daemon on localhost N/A N/A Y 602 >> Self-heal Daemon on web3 N/A N/A Y 790 >> Self-heal Daemon on web4 N/A N/A Y 636 >> Self-heal Daemon on web2 N/A N/A Y 523 >> >> Task Status of Volume share >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> [root@web1 ~]# gluster peer status >> Number of Peers: 3 >> >> Hostname: web3 >> Uuid: b138b4d5-8623-4224-825e-1dfdc3770743 >> State: Peer in Cluster (Connected) >> >> Hostname: web2 >> Uuid: b3926959-3ae8-4826-933a-4bf3b3bd55aa >> State: Peer in Cluster (Connected) >> Other names: >> c10840.sgvps.net <http://c10840.sgvps.net> >> >> Hostname: web4 >> Uuid: f7553cba-c105-4d2c-8b89-e5e78a269847 >> State: Peer in Cluster (Connected) >> >> All in all, we have three servers that are servers and actually store the >> data and one server which is just a peer and is connected to one of the >> other servers. >> * >> * >> *The Problem*: If any of the 4 servers goes down then the cluster continues >> to work as expected. However, once this server comes back up then the whole >> cluster stalls for a certain period of time (30-120 seconds). During this >> period no I/O >> operations could be executed and the apps that use the data on the GlusterFS >> simply go down because they cannot read/write any data. >> >> We suspect that the issue is related to the self-heal daemons but we are not >> sure. Could you please advice how to debug this issue and what could be >> causing the whole cluster to go down. If it is the self-heal as we suspect >> do you think it is ok to >> disable it. If some of the settings are causing this problem could you >> please advice how to configure the cluster to avoid this problem. >> > > What version of gluster is this?
3.7.6 > Do you observe the problem even when only the 4th 'non data' server comes up? > In that case it is unlikely that self-heal is the issue. No > Are the clients using FUSE or NFS mounts? FUSE > -Ravi >> If any info from the logs is requested please let us know what do you need. >> >> Thanks in advance! >> >> Regards, >> Daniel >> >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://www.gluster.org/mailman/listinfo/gluster-users > >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
