Re: [Gluster-users] Failed Volume

Jarsulic, Michael [CRI] Fri, 26 May 2017 07:59:28 -0700

Here is some further information on this issue:

The version of gluster we are using is 3.7.6.


Also, the error I found in the cmd history is:
[2017-05-26 04:28:28.332700]  : volume remove-brick hpcscratch 
cri16fs001-ib:/data/brick1/scratch commit : FAILED : Commit failed on 
cri16fs003-ib. Please check log file for details.

I did not notice this at the time and made an attempt to remove the next brick 
to migrate the data off of the system. This left the servers in the following 
state.

fs001 - /var/lib/glusterd/vols/hpcscratch/info

type=0
count=3
status=1
sub_count=0
stripe_count=1
replica_count=1
disperse_count=0
redundancy_count=0
version=42
transport-type=0
volume-id=80b8eeed-1e72-45b9-8402-e01ae0130105
…
op-version=30700
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
server.event-threads=8
performance.client-io-threads=on
client.event-threads=8
performance.cache-size=32MB
performance.readdir-ahead=on
brick-0=cri16fs001-ib:-data-brick2-scratch
brick-1=cri16fs003-ib:-data-brick5-scratch
brick-2=cri16fs003-ib:-data-brick6-scratch


fs003 - cat /var/lib/glusterd/vols/hpcscratch/info

type=0
count=4
status=1
sub_count=0
stripe_count=1
replica_count=1
disperse_count=0
redundancy_count=0
version=35
transport-type=0
volume-id=80b8eeed-1e72-45b9-8402-e01ae0130105
…
op-version=30700
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
performance.cache-size=32MB
client.event-threads=8
performance.client-io-threads=on
server.event-threads=8
brick-0=cri16fs001-ib:-data-brick1-scratch
brick-1=cri16fs001-ib:-data-brick2-scratch
brick-2=cri16fs003-ib:-data-brick5-scratch
brick-3=cri16fs003-ib:-data-brick6-scratch


fs001 - /var/lib/glusterd/vols/hpcscratch/node_state.info

rebalance_status=5
status=4
rebalance_op=0
rebalance-id=00000000-0000-0000-0000-000000000000
brick1=cri16fs001-ib:/data/brick2/scratch
count=1


fs003 - /var/lib/glusterd/vols/hpcscratch/node_state.info

rebalance_status=1
status=0
rebalance_op=9
rebalance-id=0184577f-eb64-4af9-924d-91ead0605a1e
brick1=cri16fs001-ib:/data/brick1/scratch
count=1


-- 
Mike Jarsulic


On 5/26/17, 8:22 AM, "[email protected] on behalf of Jarsulic, 
Michael [CRI]" <[email protected] on behalf of 
[email protected]> wrote:

    Recently, I had some problems with the OS hard drives in my glusterd 
servers and took one of my systems down for maintenance. The first step was to 
remove one of the bricks (brick1) hosted on the server (fs001). The data 
migrated successfully and completed last night. After that, I went to commit 
the changes and the commit failed. Afterwards, glusterd will not start on one 
of my servers (fs003). When I check the glusterd logs on fs003 I get the 
following errors whenever glusterd starts:
    
    [2017-05-26 04:37:21.358932] I [MSGID: 100030] [glusterfsd.c:2318:main] 
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 (args: 
/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid)
    [2017-05-26 04:37:21.382630] I [MSGID: 106478] [glusterd.c:1350:init] 
0-management: Maximum allowed open file descriptors set to 65536
    [2017-05-26 04:37:21.382712] I [MSGID: 106479] [glusterd.c:1399:init] 
0-management: Using /var/lib/glusterd as working directory
    [2017-05-26 04:37:21.422858] I [MSGID: 106228] 
[glusterd.c:433:glusterd_check_gsync_present] 0-glusterd: geo-replication 
module not installed in the system [No such file or directory]
    [2017-05-26 04:37:21.450123] I [MSGID: 106513] 
[glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: retrieved 
op-version: 30706
    [2017-05-26 04:37:21.463812] E [MSGID: 101032] 
[store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to 
/var/lib/glusterd/vols/hpcscratch/bricks/cri16fs001-ib:-data-brick1-scratch. 
[No such file or directory]
    [2017-05-26 04:37:21.463866] E [MSGID: 106201] 
[glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: Unable to 
restore volume: hpcscratch
    [2017-05-26 04:37:21.463919] E [MSGID: 101019] [xlator.c:428:xlator_init] 
0-management: Initialization of volume 'management' failed, review your volfile 
again
    [2017-05-26 04:37:21.463943] E [graph.c:322:glusterfs_graph_init] 
0-management: initializing translator failed
    [2017-05-26 04:37:21.463970] E [graph.c:661:glusterfs_graph_activate] 
0-graph: init failed
    [2017-05-26 04:37:21.466703] W [glusterfsd.c:1236:cleanup_and_exit] 
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xda) [0x405cba] 
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x116) [0x405b96] 
-->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: received signum 
(0), shutting down
    
    The volume is distribution only. The problem to me looks like it is still 
expecting brick1 on fs001 to be available in the volume. Is there any way to 
recover from this? Is there any more information that I can provide?
    
    
    --
    Mike Jarsulic
    
    _______________________________________________
    Gluster-users mailing list
    [email protected]
    
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=CwICAg&c=Nd1gv_ZWYNIRyZYZmXb18oVfc3lTqv2smA_esABG70U&r=Ak787_FO1coN0_NpWYelxgxjFkaWMHYbXVCdYf-STow&m=zlkeQUf69-VWf8o96ZWr-vxNatuWZvCgYuHnUVj3u70&s=8YOysLTMfJHXS6dSVgP7X0o0LovgLcIuPjfoSY2Kt2Q&e=
 
    

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Failed Volume

Reply via email to