Hi Atin, Nice catch, as you said there was a mistake in the host file of gl4 where gl5 and gl6 were missing, now it works fine.
Thanks, Cédric > On 14 Dec 2016, at 05:10, Atin Mukherjee <[email protected]> wrote: > > > > On Wed, Dec 14, 2016 at 9:36 AM, Atin Mukherjee <[email protected] > <mailto:[email protected]>> wrote: > From gl4.dump file: > > glusterd.peer4.hostname=gl5 > > glusterd.peer4.port=0 > > glusterd.peer4.state=3 > > glusterd.peer4.quorum-action=0 > > glusterd.peer4.quorum-contrib=2 > > glusterd.peer4.detaching=0 > > glusterd.peer4.locked=0 > > glusterd.peer4.rpc.peername= > > glusterd.peer4.rpc.connected=0 <===== this indicates the gl5 is not > connected with gl4, so add-brick command failed as it supposed to in this > case > glusterd.peer4.rpc.total-bytes-read=0 > > glusterd.peer4.rpc.total-bytes-written=0 > > glusterd.peer4.rpc.ping_msgs_sent=0 > > glusterd.peer4.rpc.msgs_sent=0 > > And the same goes true for gl6 as well as per this dump. So the issue is with > gl4 node. > > Now, in gl4's glusterd log I see the repetitive entries of following logs: > > [2016-12-13 16:35:31.438462] E > [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution > failed on host gl5 > [2016-12-13 16:35:33.440155] E > [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution > failed on host gl6 > [2016-12-13 16:35:34.441639] E > [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution > failed on host gl5 > [2016-12-13 16:35:36.454546] E > [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution > failed on host gl6 > [2016-12-13 16:35:37.456062] E > [name.c:262:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution > failed on host gl5 > > The above indicates that gl4 is not able to resolve the DNS name for gl5 & > gl6 where as in gl5 & gl6 it could resolve for gl4. Please check your DNS > configuration and see if there are any incorrect entries put up there. From > our side what we need to check is why peer status didn't show both gl5 & gl6 > as disconnected. > > Can you run gluster peer status from gl4 and see if both gl5 & gl6 are > mentioned as disconnected, if so then its expected, since gl5 & gl6 were > connected for all the nodes apart from gl4 peer status on all the other nodes > apart from gl4 would show up as connected and that's an expected behaviour. > Please do confirm. > > > > On Wed, Dec 14, 2016 at 12:44 AM, Cedric Lemarchand <[email protected] > <mailto:[email protected]>> wrote: > Thanks Atin, the files you asked : https://we.tl/XrOvFhffGq > <https://we.tl/XrOvFhffGq> > >> On 13 Dec 2016, at 19:08, Atin Mukherjee <[email protected] >> <mailto:[email protected]>> wrote: >> >> Thanks, we will get back on this. In the mean time can you please also share >> glusterd statedump file from both the nodes? The way to take statedump is >> 'kill -SIGUSR1 $(pidof glusterd)' and the file can be found at >> /var/run/gluster directory. >> >> On Tue, 13 Dec 2016 at 22:11, Cedric Lemarchand <[email protected] >> <mailto:[email protected]>> wrote: >> 1. sorry, 3.9.0-1 >> 2. no it does nothing >> 3. here they are, from gl1 to gl6 : https://we.tl/EPaMs6geoR >> <https://we.tl/EPaMs6geoR> >> >> >> >>> On 13 Dec 2016, at 16:49, Atin Mukherjee <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> And 3. In case 2 doesn't work, please provide the glusterd log files from >>> gl1 & gl5 >>> >>> On Tue, Dec 13, 2016 at 9:16 PM, Atin Mukherjee <[email protected] >>> <mailto:[email protected]>> wrote: >>> 1. Could you mention which gluster version are you running with? >>> 2. Does restarting glusterd instance on gl1 & gl5 solves the issue (after >>> removing the volume-id xattr from the bricks) ? >>> >>> On Tue, Dec 13, 2016 at 8:56 PM, Cedric Lemarchand <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hello, >>> >>> >>> >>> >>> >>> When I try to add 3 bricks to a working cluster composed of 3 nodes / 3 >>> bricks in dispersed mode 2+1, it fails like this : >>> >>> >>> >>> >>> >>> root@gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 >>> gl6:/data/br1 >>> >>> >>> volume add-brick: failed: Pre Validation failed on gl4. Host gl5 not >>> connected >>> >>> >>> >>> >>> >>> However all peers are connected and there aren't networking issues : >>> >>> >>> >>> >>> >>> root@gl1:~# gluster peer status >>> >>> >>> Number of Peers: 5 >>> >>> >>> >>> >>> >>> Hostname: gl2 >>> >>> >>> Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl3 >>> >>> >>> Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl4 >>> >>> >>> Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl5 >>> >>> >>> Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl6 >>> >>> >>> Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> : >>> >>> >>> >>> >>> >>> When I try a second time, the error is different : >>> >>> >>> >>> >>> >>> root@gl1:~# gluster volume add-brick vol1 gl4:/data/br1 gl5:/data/br1 >>> gl6:/data/br1 >>> >>> >>> volume add-brick: failed: Pre Validation failed on gl5. /data/br1 is >>> already part of a volume >>> >>> >>> Pre Validation failed on gl6. /data/br1 is already part of a volume >>> >>> >>> Pre Validation failed on gl4. /data/br1 is already part of a volume >>> >>> >>> >>> >>> >>> It seems the previous try, even if it has failed, have created the gluster >>> attributes on file system as shown by attr on gl4/5/6 : >>> >>> >>> >>> >>> >>> Attribute "glusterfs.volume-id" has a 16 byte value for /data/br1 >>> >>> >>> >>> >>> >>> I already purge gluster and reformat brick on gl4/5/6 but the issue >>> persist, any ideas ? did I miss something ? >>> >>> >>> >>> >>> >>> >>> >>> >>> Some informations : >>> >>> >>> >>> >>> >>> root@gl1:~# gluster volume info >>> >>> >>> >>> >>> >>> Volume Name: vol1 >>> >>> >>> Type: Disperse >>> >>> >>> Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3 >>> >>> >>> Status: Started >>> >>> >>> Snapshot Count: 0 >>> >>> >>> Number of Bricks: 1 x (2 + 1) = 3 >>> >>> >>> Transport-type: tcp >>> >>> >>> Bricks: >>> >>> >>> Brick1: gl1:/data/br1 >>> >>> >>> Brick2: gl2:/data/br1 >>> >>> >>> Brick3: gl3:/data/br1 >>> >>> >>> Options Reconfigured: >>> >>> >>> features.scrub-freq: hourly >>> >>> >>> features.scrub: Inactive >>> >>> >>> features.bitrot: off >>> >>> >>> cluster.disperse-self-heal-daemon: enable >>> >>> >>> transport.address-family: inet >>> >>> >>> performance.readdir-ahead: on >>> >>> >>> nfs.disable: on, I have the following error : >>> >>> >>> >>> >>> >>> root@gl1:~# gluster volume status >>> >>> >>> Status of volume: vol1 >>> >>> >>> Gluster process TCP Port RDMA Port Online Pid >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> >>> Brick gl1:/data/br1 49152 0 Y >>> 23403 >>> >>> >>> Brick gl2:/data/br1 49152 0 Y >>> 14545 >>> >>> >>> Brick gl3:/data/br1 49152 0 Y >>> 11348 >>> >>> >>> Self-heal Daemon on localhost N/A N/A Y >>> 24766 >>> >>> >>> Self-heal Daemon on gl4 N/A N/A Y >>> 1087 >>> >>> >>> Self-heal Daemon on gl5 N/A N/A Y >>> 1080 >>> >>> >>> Self-heal Daemon on gl3 N/A N/A Y >>> 12321 >>> >>> >>> Self-heal Daemon on gl2 N/A N/A Y >>> 15496 >>> >>> >>> Self-heal Daemon on gl6 N/A N/A Y >>> 1091 >>> >>> >>> >>> >>> >>> Task Status of Volume vol1 >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> >>> There are no active volume tasks >>> >>> >>> >>> >>> >>> root@gl1:~# gluster volume info >>> >>> >>> >>> >>> >>> Volume Name: vol1 >>> >>> >>> Type: Disperse >>> >>> >>> Volume ID: bb563884-0e2a-4757-9fd5-cb851ba113c3 >>> >>> >>> Status: Started >>> >>> >>> Snapshot Count: 0 >>> >>> >>> Number of Bricks: 1 x (2 + 1) = 3 >>> >>> >>> Transport-type: tcp >>> >>> >>> Bricks: >>> >>> >>> Brick1: gl1:/data/br1 >>> >>> >>> Brick2: gl2:/data/br1 >>> >>> >>> Brick3: gl3:/data/br1 >>> >>> >>> Options Reconfigured: >>> >>> >>> features.scrub-freq: hourly >>> >>> >>> features.scrub: Inactive >>> >>> >>> features.bitrot: off >>> >>> >>> cluster.disperse-self-heal-daemon: enable >>> >>> >>> transport.address-family: inet >>> >>> >>> performance.readdir-ahead: on >>> >>> >>> nfs.disable: on >>> >>> >>> >>> >>> >>> root@gl1:~# gluster peer status >>> >>> >>> Number of Peers: 5 >>> >>> >>> >>> >>> >>> Hostname: gl2 >>> >>> >>> Uuid: 616f100f-a3f4-46e4-b161-ee5db5a60e26 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl3 >>> >>> >>> Uuid: acb828b8-f4b3-42ab-a9d2-b3e7b917dc9a >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl4 >>> >>> >>> Uuid: 813ad056-5e84-4fdb-ac13-38d24c748bc4 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl5 >>> >>> >>> Uuid: a7933aeb-b08b-4ebb-a797-b8ecbe5a03c6 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> Hostname: gl6 >>> >>> >>> Uuid: 63c9a6c1-0adf-4cf5-af7b-b28a60911c99 >>> >>> >>> State: Peer in Cluster (Connected) >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> >>> Gluster-users mailing list >>> >>> >>> [email protected] <mailto:[email protected]> >>> >>> >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> <http://www.gluster.org/mailman/listinfo/gluster-users> >>> >>> >>> >>> >>> >>> -- >>> >>> ~ Atin (atinm) >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> ~ Atin (atinm) >>> >>> >>> >>> >> >> -- >> - Atin (atinm) > > > > > -- > > ~ Atin (atinm) > > > > -- > > ~ Atin (atinm)
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
