[Gluster-users] "gluster peer status" messed up

Brian Candler Mon, 03 Dec 2012 05:45:06 -0800

I have three machines, all Ubuntu 12.04 running gluster 3.3.1.

    storage1  192.168.6.70 on 10G, 192.168.5.70 on 1G
    storage2  192.168.6.71 on 10G, 192.168.5.71 on 1G
    storage3  192.168.6.72 on 10G, 192.168.5.72 on 1G


Each machine has two NICs, but on each host, /etc/hosts lists the 10G
interface on all machines.

storage1 and storage3 were taken away for hardware changes, which included
swapping the boot disks. They had the O/S reinstalled.

Somehow I have gotten into a state where "gluster peer status" is broken.

[on storage1]

  # gluster peer status
  (Just hangs here until I press ^C)

[on storage2]

  # gluster peer status
  Number of Peers: 2

  Hostname: 192.168.6.70
  Uuid: bf320f69-2713-4b57-9003-a721a8101bc6
  State: Peer in Cluster (Connected)

  Hostname: storage3
  Uuid: 1b058f9f-c116-496f-8b50-fb581f9625f0
  State: Peer Rejected (Connected)                << note "Rejected"

[on storage3]

  # gluster peer status
  Number of Peers: 2

  Hostname: 192.168.6.70
  Uuid: 698ee46d-ab8c-45f6-a6b6-7af998430a37
  State: Peer in Cluster (Connected)

  Hostname: storage2
  Uuid: 2c0670f4-c3ba-46e0-92a8-108e71832b59
  State: Peer Rejected (Connected)                << note "Rejected"

Poking around the filesystem a bit:

[on storage1]

  root@storage1:~# cat /var/lib/glusterd/glusterd.info 
  UUID=bf320f69-2713-4b57-9003-a721a8101bc6
  root@storage1:~# ls /var/lib/glusterd/peers/
  2c0670f4-c3ba-46e0-92a8-108e71832b59
  root@storage1:~# head 
/var/lib/glusterd/peers/*uuid=2c0670f4-c3ba-46e0-92a8-108e71832b59
  state=4
  hostname1=storage2

[on storage2]

  # cat /var/lib/glusterd/glusterd.info 
  UUID=2c0670f4-c3ba-46e0-92a8-108e71832b59
  # head /var/lib/glusterd/peers/*
  ==> /var/lib/glusterd/peers/1b058f9f-c116-496f-8b50-fb581f9625f0 <==
  uuid=1b058f9f-c116-496f-8b50-fb581f9625f0
  state=6
  hostname1=storage3

  ==> /var/lib/glusterd/peers/698ee46d-ab8c-45f6-a6b6-7af998430a37 <==
  uuid=bf320f69-2713-4b57-9003-a721a8101bc6
  state=3
  hostname1=192.168.6.70

[on storage3]

  # cat /var/lib/glusterd/glusterd.info 
  UUID=1b058f9f-c116-496f-8b50-fb581f9625f0
  # head /var/lib/glusterd/peers/*
  ==> /var/lib/glusterd/peers/2c0670f4-c3ba-46e0-92a8-108e71832b59 <==
  uuid=2c0670f4-c3ba-46e0-92a8-108e71832b59
  state=6
  hostname1=storage2

  ==> /var/lib/glusterd/peers/698ee46d-ab8c-45f6-a6b6-7af998430a37 <==
  uuid=698ee46d-ab8c-45f6-a6b6-7af998430a37
  state=3
  hostname1=192.168.6.70

Obvious problems:
- storage1 is known to its peers by IP address, not by hostname
- storage3 has the wrong UUID for storage1
- storage2 and storage3 are failing to be peers, "Peer Rejected" whatever
  that means (however I do have clients accessing data on a volume on
  storage2 and a volume on storage3)

On storage1, typing "gluster peer detach storage2" or "gluster peer detach
storage3" just hangs.

Detaching storage1 from the other side fails:

  root@storage2:~# gluster peer detach storage1
  One of the peers is probably down. Check with 'peer status'.
  root@storage2:~# gluster peer detach 192.168.6.70
  One of the peers is probably down. Check with 'peer status'.

Then I found something very suspicious on storage1:

  root@storage1:~# tail /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
  [2012-12-03 12:50:36.208029] I 
[glusterd-op-sm.c:2653:glusterd_op_txn_complete] 0-glusterd: Cleared local lock
  [2012-12-03 12:51:05.023553] I 
[glusterd-handler.c:1168:glusterd_handle_sync_volume] 0-glusterd: Received 
volume sync req for volume all
  [2012-12-03 12:51:05.023741] I [glusterd-utils.c:285:glusterd_lock] 
0-glusterd: Cluster lock held by bf320f69-2713-4b57-9003-a721a8101bc6
  [2012-12-03 12:51:05.023761] I [glusterd-handler.c:463:glusterd_op_txn_begin] 
0-management: Acquired local lock
  [2012-12-03 12:51:05.024176] I 
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC 
from uuid: 2c0670f4-c3ba-46e0-92a8-108e71832b59
  [2012-12-03 12:51:05.024214] C 
[glusterd-op-sm.c:1946:glusterd_op_build_payload] 0-management: volname is not 
present in operation ctx
  pending frames:

  patchset: git://git.gluster.com/glusterfs.git
  signal received: 11
  root@storage1:~# ps auxwww | grep gluster
  root      1584  0.0  0.1 230516 10668 ?        Ssl  11:36   0:01 
/usr/sbin/glusterd -p /var/run/glusterd.pid
  root      6466  0.0  0.0   9392   920 pts/0    S+   13:35   0:00 grep 
--color=auto gluster

Hmm... so as you can see, there was a SEGV signal, however glusterd was
still running.  But stopping it and starting it I was able to do "gluster
peer status" again.

  root@storage1:~# service glusterfs-server stop
  ps auxwww | grep glusterglusterfs-server stop/waiting
  root@storage1:~# ps auxwww | grep gluster
  root      6478  0.0  0.0   9388   920 pts/0    S+   13:36   0:00 grep 
--color=auto gluster
  root@storage1:~# service glusterfs-server start
  glusterfs-server start/running, process 6485
  root@storage1:~# gluster peer status
  Number of Peers: 1

  Hostname: storage2
  Uuid: 2c0670f4-c3ba-46e0-92a8-108e71832b59
  State: Peer in Cluster (Connected)
  root@storage1:~# 

But I still cannot detach from either side. From the storage1 side:

  root@storage1:~# gluster peer status
  Number of Peers: 1

  Hostname: storage2
  Uuid: 2c0670f4-c3ba-46e0-92a8-108e71832b59
  State: Peer in Cluster (Connected)
  root@storage1:~# gluster peer detach storage2
  Brick(s) with the peer storage2 exist in cluster

>From the storage2 side:

  root@storage2:~# gluster peer status
  Number of Peers: 2

  Hostname: 192.168.6.70
  Uuid: bf320f69-2713-4b57-9003-a721a8101bc6
  State: Peer in Cluster (Connected)

  Hostname: storage3
  Uuid: 1b058f9f-c116-496f-8b50-fb581f9625f0
  State: Peer Rejected (Connected)
  root@storage2:~# gluster peer detach 192.168.6.70
  One of the peers is probably down. Check with 'peer status'.
  root@storage2:~# gluster peer detach storage1
  One of the peers is probably down. Check with 'peer status'.

So this all looks broken, and as I can't find any gluster documentation
saying what these various states mean, I'm not sure how to proceed.  Any
suggestions?

Note: I have no replicated volumes, only distributed ones.

Thanks,

Brian.
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] "gluster peer status" messed up

Reply via email to