Harry-
Thanks for the tip. My problem could well have been the same as yours.
I have known for some time that "gluster peer status" doesn't give
useful connection information but I didn't know about the "gluster
volume status" commands; they must be new in version 3.3. I usually
discover connection problems by seeing phrases like "disconnected" and
"anomalies" in the logs. This has been happening more often since I
upgraded to version 3.3, and I suspect it is being caused by the very
high load experienced by some servers. I have seen this load problem
discussed in other threads. The next time I attempt a rebalance
operation I will run "gluster volume status all detail" first to check
connectivity.
-Dan
On 08/08/2012 08:31 PM, Harry Mangalam wrote:
This sounds similar, tho not identical to a problem that I had
recently (descriibed here:
<http://gluster.org/pipermail/gluster-users/2012-August/011054.html>
My problems resulted were teh result of starting this kind of
rebalance with a server node appearing to be connected (via the
'gluster peer status' output, but not actually being connected as
shown by the
'gluster volume status all detail' output. Note especially the part
that describes its online state.
------------------------------------------------------------------------------
Brick : Brick pbs3ib:/bducgl
Port : 24018
Online : N <<=====================
Pid : 20953
File System : xfs
You may have already verified this, but what I did was to start a
rebalance / fix-layout with a disconnected brick and it went ahead and
tried to do it, unsuccessfully as you might guess.. But when I
finally was able to reconnect the downed brick, and restart the
rebalance, it (astonishingly) was able to bring everything back. So
props to the gluster team.
hjm
On Wed, Aug 8, 2012 at 11:58 AM, Dan Bretherton
<[email protected] <mailto:[email protected]>>
wrote:
Hello All-
I have noticed another problem after upgrading to version 3.3. I
am unable to do "gluster volume rebalance <VOLUME> fix-layout
status" or "...fix-layout ... stop" after starting a rebalance
operation with "gluster volume rebalance <VOLUME> fix-layout
start". The fix-layout operation seemed to be progressing
normally on all the servers according to the log files, but all
attempts to do "status" or "stop" result in the CLI usage message
being returned. The only reference to the rebalance commands in
the log files were these, which all the servers seem to have one
or more of.
[root@romulus glusterfs]# grep rebalance *.log
etc-glusterfs-glusterd.vol.log:[2012-08-08 12:49:04.870709] W
[socket.c:1512:__socket_proto_state_machine] 0-management: reading
from socket failed. Error (Transport endpoint is not connected),
peer
(/var/lib/glusterd/vols/tracks/rebalance/cb21050d-05c2-42b3-8660-230954bab324.sock)
tracks-rebalance.log:[2012-08-06 10:41:18.550241] I
[graph.c:241:gf_add_cmdline_options] 0-tracks-dht: adding option
'rebalance-cmd' for volume 'tracks-dht' with value '4'
The volume name is "tracks" by the way. I wanted to stop the
rebalance operation because it seemed to be causing a very high
load on some of the servers had been running for several days. I
ended up having to manually kill the rebalance processes on all
the servers followed by restarting glusterd.
After that I found that one of the servers had
"rebalance_status=4" in file
/var/lib/glusterd/vols/tracks/node_state.info
<http://node_state.info>, whereas all the others had
"rebalance_status=0". I manually changed the '4' to '0' and
restarted glusterd. I don't know if this was a consequence of the
way I had killed the rebalance operation or the cause of the
strange behaviour. I don't really want to start another rebalance
going to test because the last one was so disruptive.
Has anyone else experienced this problem since upgrading to 3.3?
Regards,
Dan.
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
--
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users