The solution has been found, but it's kind of ugly.
peer detach
stop gluster on new node, wipe /var/lib/gluster
restart gluster on new node
on old node, run:
for Q in `gluster volume list`; do
gluster reset $Q
done
peer probe
After this it successfully connected the new node.
I have no idea why this was required.
We still can't remove, replace or add bricks but I'll continue that in
another thread..
-T
On 07/08/13 10:51, Toby Corkindale wrote:
On 06/08/13 21:25, Kaushal M wrote:
Toby,
What versions of gluster are on the peers? And does the cluster have
just two peers or more?
Version 3.3.1.
The cluster has/had two nodes; we're trying to replace one with another
one.
On Tue, Aug 6, 2013 at 4:32 PM, Toby Corkindale
<[email protected]> wrote:
----- Original Message -----
From: "Toby Corkindale" <[email protected]>
To: [email protected]
Sent: Tuesday, 6 August, 2013 6:26:59 PM
Subject: Re: [Gluster-users] peer status rejected (connected)
On 06/08/13 18:12, Toby Corkindale wrote:
Hi,
What does it mean when you use "peer probe" to add a new host, but
then
afterwards the "peer status" is reported as "Rejected" yet
"Connected"?
And of course -- how does one fix this?
gluster> peer status
Number of Peers: 1
Hostname: 192.168.10.32
Uuid: 32497846-6e02-4b68-b147-6f4b936b3373
State: Peer Rejected (Connected)
It's worth noting that the attempt to probe the peer was listed as
successful though:
gluster> peer probe mel-storage04
Probe successful
gluster> peer status
Number of Peers: 1
Hostname: mel-storage04
Uuid: 6254c24d-29d4-4794-8159-3c2b03b34798
State: Peer Rejected (Connected)
After searching around some more, I saw that this issue is usually
caused by two peers joining, when one has a very out of date volume
list.
And indeed, in the log files I see messages about checksums failing
to agree on volumes being exchanged.
The odd thing is, this is a fresh server, running the same version of
glusterfs.
I tried stopping the services entirely, rm -rf /var/lib/glusterfs/*,
and then started up again and tried probing that peer -- and received
the same Rejection.
I'm confused as to how it could possibly be getting a different
volume checksum, when it didn't even have its own copy.
Does the community have any suggestions about resolving this?
See also, inability to remove or replace bricks in separate message -
which might be related, although the errors occur even if run on the
cluster without this problematic peer attached at all.
-Toby
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users