Harry-
Thanks for the tip. My problem could well have been the same as yours. I have known for some time that "gluster peer status" doesn't give useful connection information but I didn't know about the "gluster volume status" commands; they must be new in version 3.3. I usually discover connection problems by seeing phrases like "disconnected" and "anomalies" in the logs. This has been happening more often since I upgraded to version 3.3, and I suspect it is being caused by the very high load experienced by some servers. I have seen this load problem discussed in other threads. The next time I attempt a rebalance operation I will run "gluster volume status all detail" first to check connectivity.

-Dan

On 08/08/2012 08:31 PM, Harry Mangalam wrote:
This sounds similar, tho not identical to a problem that I had recently (descriibed here:
<http://gluster.org/pipermail/gluster-users/2012-August/011054.html>
My problems resulted were teh result of starting this kind of rebalance with a server node appearing to be connected (via the 'gluster peer status' output, but not actually being connected as shown by the 'gluster volume status all detail' output. Note especially the part that describes its online state.

------------------------------------------------------------------------------
Brick                : Brick pbs3ib:/bducgl
Port                 : 24018
Online               : N <<=====================
Pid                  : 20953
File System          : xfs


You may have already verified this, but what I did was to start a rebalance / fix-layout with a disconnected brick and it went ahead and tried to do it, unsuccessfully as you might guess.. But when I finally was able to reconnect the downed brick, and restart the rebalance, it (astonishingly) was able to bring everything back. So props to the gluster team.

hjm


On Wed, Aug 8, 2012 at 11:58 AM, Dan Bretherton <[email protected] <mailto:[email protected]>> wrote:

    Hello All-
    I have noticed another problem after upgrading to version 3.3.  I
    am unable to do "gluster volume rebalance <VOLUME> fix-layout
    status" or "...fix-layout ... stop" after starting a rebalance
    operation with "gluster volume rebalance <VOLUME> fix-layout
    start".   The fix-layout operation seemed to be progressing
    normally on all the servers according to the log files, but all
    attempts to do "status" or "stop" result in the CLI usage message
    being returned.  The only reference to the rebalance commands in
    the log files were these, which all the servers seem to have one
    or more of.

    [root@romulus glusterfs]# grep rebalance *.log
    etc-glusterfs-glusterd.vol.log:[2012-08-08 12:49:04.870709] W
    [socket.c:1512:__socket_proto_state_machine] 0-management: reading
    from socket failed. Error (Transport endpoint is not connected),
    peer
    
(/var/lib/glusterd/vols/tracks/rebalance/cb21050d-05c2-42b3-8660-230954bab324.sock)
    tracks-rebalance.log:[2012-08-06 10:41:18.550241] I
    [graph.c:241:gf_add_cmdline_options] 0-tracks-dht: adding option
    'rebalance-cmd' for volume 'tracks-dht' with value '4'

    The volume name is "tracks" by the way.  I wanted to stop the
    rebalance operation because it seemed to be causing a very high
    load on some of the servers had been running for several days.  I
    ended up having to manually kill the rebalance processes on all
    the servers followed by restarting glusterd.

    After that I found that one of the servers had
    "rebalance_status=4" in file
    /var/lib/glusterd/vols/tracks/node_state.info
    <http://node_state.info>, whereas all the others had
    "rebalance_status=0".  I manually changed the '4' to '0' and
    restarted glusterd.  I don't know if this was a consequence of the
    way I had killed the rebalance operation or the cause of the
    strange behaviour.  I don't really want to start another rebalance
    going to test because the last one was so disruptive.

    Has anyone else experienced this problem since upgrading to 3.3?

    Regards,
    Dan.

    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://gluster.org/cgi-bin/mailman/listinfo/gluster-users




--
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to