Re: [Gluster-users] Failed snapshot clone leaving undeletable orphaned volume on a single peer

Avra Sengupta Mon, 20 Feb 2017 22:23:37 -0800

Hi D,

We tried reproducing the issue with a similar setup but were unable todo so. We are still investigating it.

I have another follow-up question. You said that the repo exists only ins0? If that was the case, then bringing glusterd down on s0 only,deleteing the repo and starting glusterd once again would have removedit. The fact that the repo is restored as soon as glusterd restarts ons0, means that some other node(s) in the cluster also has that repo andis passing that information to the glusterd in s0 during handshake.Could you please confirm if any other node apart from s0 has theparticular repo(/var/lib/glusterd/vols/data-teste) or not. Thanks.


Regards,
Avra

On 02/20/2017 06:51 PM, Gambit15 wrote:

Hi Avra,

On 20 February 2017 at 02:51, Avra Sengupta <[email protected]<mailto:[email protected]>> wrote:


    Hi D,

    It seems you tried to take a clone of a snapshot, when that
    snapshot was not activated.

Correct. As per my commands, I then noticed the issue, checked thesnapshot's status & activated it. I included this in my commandhistory just to clear up any doubts from the logs.


    However in this scenario, the cloned volume should not be in an
    inconsistent state. I will try to reproduce this and see if it's a
    bug. Meanwhile could you please answer the following queries:
    1. How many nodes were in the cluster.


There are 4 nodes in a (2+1)x2 setup.

s0 replicates to s1, with an arbiter on s2, and s2 replicates to s3,with an arbiter on s0.


    2. How many bricks does the snapshot
    data-bck_GMT-2017.02.09-14.15.43 have?

6 bricks, including the 2 arbiters.

    3. Was the snapshot clone command issued from a node which did not
    have any bricks for the snapshot data-bck_GMT-2017.02.09-14.15.43

All commands were issued from s0. All volumes have bricks on everynode in the cluster.


    4. I see you tried to delete the new cloned volume. Did the new
    cloned volume land in this state after failure to create the clone
    or failure to delete the clone

I noticed there was something wrong as soon as I created the clone.The clone command completed, however I was then unable to do anythingwith it because the clone didn't exist on s1-s3.



    If you want to remove the half baked volume from the cluster
    please proceed with the following steps.
    1. bring down glusterd on all nodes by running the following
    command on all nodes
    $ systemctl stop glusterd.
    Verify that the glusterd is down on all nodes by running the
    following command on all nodes
    $ systemctl status glusterd.
    2. delete the following repo from all the nodes (whichever nodes
    it exists)
    /var/lib/glusterd/vols/data-teste

The repo only exists on s0, but stoppping glusterd on only s0 &deleting the directory didn't work, the directory was restored as soonas glusterd was restarted. I haven't yet tried stopping glusterd on*all* nodes before doing this, although I'll need to plan for that, asit'll take the entire cluster off the air.


Thanks for the reply,
 Doug


    Regards,
    Avra


    On 02/16/2017 08:01 PM, Gambit15 wrote:

    Hey guys,
     I tried to create a new volume from a cloned snapshot yesterday,
    however something went wrong during the process & I'm now stuck
    with the new volume being created on the server I ran the
    commands on (s0), but not on the rest of the peers. I'm unable to
    delete this new volume from the server, as it doesn't exist on
    the peers.

    What do I do?
    Any insights into what may have gone wrong?

    CentOS 7.3.1611
    Gluster 3.8.8

    The command history & extract from etc-glusterfs-glusterd.vol.log
    are included below.

    gluster volume list
    gluster snapshot list
    gluster snapshot clone data-teste data-bck_GMT-2017.02.09-14.15.43
    gluster volume status data-teste
    gluster volume delete data-teste
    gluster snapshot create teste data
    gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04
    gluster snapshot status
    gluster snapshot activate teste_GMT-2017.02.15-12.44.04
    gluster snapshot clone data-teste teste_GMT-2017.02.15-12.44.04


    [2017-02-15 12:43:21.667403] I [MSGID: 106499]
    [glusterd-handler.c:4349:__glusterd_handle_status_volume]
    0-management: Received status volume req for volume data-teste
    [2017-02-15 12:43:21.682530] E [MSGID: 106301]
    [glusterd-syncop.c:1297:gd_stage_op_phase] 0-management: Staging
    of operation 'Volume Status' failed on localhost : Volume
    data-teste is not started
    [2017-02-15 12:43:43.633031] I [MSGID: 106495]
    [glusterd-handler.c:3128:__glusterd_handle_getwd] 0-glusterd:
    Received getwd req
    [2017-02-15 12:43:43.640597] I [run.c:191:runner_log]
    (-->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcc4b2)
    [0x7ffb396a14b2]
    -->/usr/lib64/glusterfs/3.8.8/xlator/mgmt/glusterd.so(+0xcbf65)
    [0x7ffb396a0f65] -->/lib64/libglusterfs.so.0(runner_log+0x115)
    [0x7ffb44ec31c5] ) 0-management: Ran script:
    /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post
    --volname=data-teste
    [2017-02-15 13:05:20.103423] E [MSGID: 106122]
    [glusterd-snapshot.c:2397:glusterd_snapshot_clone_prevalidate]
    0-management: Failed to pre validate
    [2017-02-15 13:05:20.103464] E [MSGID: 106443]
    [glusterd-snapshot.c:2413:glusterd_snapshot_clone_prevalidate]
    0-management: One or more bricks are not running. Please run
    snapshot status command to see brick status.
    Please start the stopped brick and then issue snapshot clone command
    [2017-02-15 13:05:20.103481] W [MSGID: 106443]
    [glusterd-snapshot.c:8563:glusterd_snapshot_prevalidate]
    0-management: Snapshot clone pre-validation failed
    [2017-02-15 13:05:20.103492] W [MSGID: 106122]
    [glusterd-mgmt.c:167:gd_mgmt_v3_pre_validate_fn] 0-management:
    Snapshot Prevalidate Failed
    [2017-02-15 13:05:20.103503] E [MSGID: 106122]
    [glusterd-mgmt.c:884:glusterd_mgmt_v3_pre_validate] 0-management:
    Pre Validation failed for operation Snapshot on local node
    [2017-02-15 13:05:20.103514] E [MSGID: 106122]
    [glusterd-mgmt.c:2243:glusterd_mgmt_v3_initiate_snap_phases]
    0-management: Pre Validation Failed
    [2017-02-15 13:05:20.103531] E [MSGID: 106027]
    [glusterd-snapshot.c:8118:glusterd_snapshot_clone_postvalidate]
    0-management: unable to find clone data-teste volinfo
    [2017-02-15 13:05:20.103542] W [MSGID: 106444]
    [glusterd-snapshot.c:9063:glusterd_snapshot_postvalidate]
    0-management: Snapshot create post-validation failed
    [2017-02-15 13:05:20.103561] W [MSGID: 106121]
    [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn] 0-management:
    postvalidate operation failed
    [2017-02-15 13:05:20.103572] E [MSGID: 106121]
    [glusterd-mgmt.c:1660:glusterd_mgmt_v3_post_validate]
    0-management: Post Validation failed for operation Snapshot on
    local node
    [2017-02-15 13:05:20.103582] E [MSGID: 106122]
    [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases]
    0-management: Post Validation Failed
    [2017-02-15 13:11:15.862858] W [MSGID: 106057]
    [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_find]
    0-management: Snap volume
    
c3ceae3889484e96ab8bed69593cf6d3.s0.run-gluster-snaps-c3ceae3889484e96ab8bed69593cf6d3-brick1-data-brick
    not found [Argumento inválido]
    [2017-02-15 13:11:16.314759] I [MSGID: 106143]
    [glusterd-pmap.c:250:pmap_registry_bind] 0-pmap: adding brick
    /run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick
    on port 49452
    [2017-02-15 13:11:16.316090] I
    [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting
    frame-timeout to 600
    [2017-02-15 13:11:16.348867] W [MSGID: 106057]
    [glusterd-snapshot-utils.c:410:glusterd_snap_volinfo_find]
    0-management: Snap volume
    
c3ceae3889484e96ab8bed69593cf6d3.s0.run-gluster-snaps-c3ceae3889484e96ab8bed69593cf6d3-brick6-data-arbiter
    not found [Argumento inválido]
    [2017-02-15 13:11:16.558878] I [MSGID: 106143]
    [glusterd-pmap.c:250:pmap_registry_bind] 0-pmap: adding brick
    /run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter
    on port 49453
    [2017-02-15 13:11:16.559883] I
    [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management: setting
    frame-timeout to 600
    [2017-02-15 13:11:23.279721] E [MSGID: 106030]
    [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot]
    0-management: taking snapshot of the brick
    (/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick)
    of device
    /dev/mapper/v0.dc0.cte--g0-c3ceae3889484e96ab8bed69593cf6d3_0 failed
    [2017-02-15 13:11:23.279790] E [MSGID: 106030]
    [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot]
    0-management: Failed to take snapshot of brick
    s0:/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick1/data/brick
    [2017-02-15 13:11:23.279806] E [MSGID: 106030]
    [glusterd-snapshot.c:6484:glusterd_take_brick_snapshot_task]
    0-management: Failed to take backend snapshot for brick
    s0:/run/gluster/snaps/data-teste/brick1/data/brick volume(data-teste)
    [2017-02-15 13:11:23.286678] E [MSGID: 106030]
    [glusterd-snapshot.c:4736:glusterd_take_lvm_snapshot]
    0-management: taking snapshot of the brick
    (/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter)
    of device
    /dev/mapper/v0.dc0.cte--g0-c3ceae3889484e96ab8bed69593cf6d3_1 failed
    [2017-02-15 13:11:23.286735] E [MSGID: 106030]
    [glusterd-snapshot.c:5135:glusterd_take_brick_snapshot]
    0-management: Failed to take snapshot of brick
    s0:/run/gluster/snaps/c3ceae3889484e96ab8bed69593cf6d3/brick6/data/arbiter
    [2017-02-15 13:11:23.286749] E [MSGID: 106030]
    [glusterd-snapshot.c:6484:glusterd_take_brick_snapshot_task]
    0-management: Failed to take backend snapshot for brick
    s0:/run/gluster/snaps/data-teste/brick6/data/arbiter
    volume(data-teste)
    [2017-02-15 13:11:23.286793] E [MSGID: 106030]
    [glusterd-snapshot.c:6626:glusterd_schedule_brick_snapshot]
    0-management: Failed to create snapshot
    [2017-02-15 13:11:23.286813] E [MSGID: 106441]
    [glusterd-snapshot.c:6796:glusterd_snapshot_clone_commit]
    0-management: Failed to take backend snapshot data-teste
    [2017-02-15 13:11:25.530666] E [MSGID: 106442]
    [glusterd-snapshot.c:8308:glusterd_snapshot] 0-management: Failed
    to clone snapshot
    [2017-02-15 13:11:25.530721] W [MSGID: 106123]
    [glusterd-mgmt.c:272:gd_mgmt_v3_commit_fn] 0-management: Snapshot
    Commit Failed
    [2017-02-15 13:11:25.530735] E [MSGID: 106123]
    [glusterd-mgmt.c:1427:glusterd_mgmt_v3_commit] 0-management:
    Commit failed for operation Snapshot on local node
    [2017-02-15 13:11:25.530749] E [MSGID: 106123]
    [glusterd-mgmt.c:2304:glusterd_mgmt_v3_initiate_snap_phases]
    0-management: Commit Op Failed
    [2017-02-15 13:11:25.532312] E [MSGID: 106027]
    [glusterd-snapshot.c:8118:glusterd_snapshot_clone_postvalidate]
    0-management: unable to find clone data-teste volinfo
    [2017-02-15 13:11:25.532339] W [MSGID: 106444]
    [glusterd-snapshot.c:9063:glusterd_snapshot_postvalidate]
    0-management: Snapshot create post-validation failed
    [2017-02-15 13:11:25.532353] W [MSGID: 106121]
    [glusterd-mgmt.c:351:gd_mgmt_v3_post_validate_fn] 0-management:
    postvalidate operation failed
    [2017-02-15 13:11:25.532367] E [MSGID: 106121]
    [glusterd-mgmt.c:1660:glusterd_mgmt_v3_post_validate]
    0-management: Post Validation failed for operation Snapshot on
    local node
    [2017-02-15 13:11:25.532381] E [MSGID: 106122]
    [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases]
    0-management: Post Validation Failed
    [2017-02-15 13:29:53.779020] E [MSGID: 106062]
    [glusterd-snapshot-utils.c:2391:glusterd_snap_create_use_rsp_dict] 
0-management:
    failed to get snap UUID
    [2017-02-15 13:29:53.779073] E [MSGID: 106099]
    [glusterd-snapshot-utils.c:2507:glusterd_snap_use_rsp_dict]
    0-glusterd: Unable to use rsp dict
    [2017-02-15 13:29:53.779096] E [MSGID: 106108]
    [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn] 0-management:
    Failed to aggregate response from  node/brick
    [2017-02-15 13:29:53.779136] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management:
    Commit failed on s3. Please check log file for details.
    [2017-02-15 13:29:54.136196] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management:
    Commit failed on s1. Please check log file for details.
    The message "E [MSGID: 106108]
    [glusterd-mgmt.c:1305:gd_mgmt_v3_commit_cbk_fn] 0-management:
    Failed to aggregate response from  node/brick" repeated 2 times
    between [2017-02-15 13:29:53.779096] and [2017-02-15 13:29:54.535080]
    [2017-02-15 13:29:54.535098] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management:
    Commit failed on s2. Please check log file for details.
    [2017-02-15 13:29:54.535320] E [MSGID: 106123]
    [glusterd-mgmt.c:1490:glusterd_mgmt_v3_commit] 0-management:
    Commit failed on peers
    [2017-02-15 13:29:54.535370] E [MSGID: 106123]
    [glusterd-mgmt.c:2304:glusterd_mgmt_v3_initiate_snap_phases]
    0-management: Commit Op Failed
    [2017-02-15 13:29:54.539708] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management:
    Post Validation failed on s1. Please check log file for details.
    [2017-02-15 13:29:54.539797] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management:
    Post Validation failed on s3. Please check log file for details.
    [2017-02-15 13:29:54.539856] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management:
    Post Validation failed on s2. Please check log file for details.
    [2017-02-15 13:29:54.540224] E [MSGID: 106121]
    [glusterd-mgmt.c:1713:glusterd_mgmt_v3_post_validate]
    0-management: Post Validation failed on peers
    [2017-02-15 13:29:54.540256] E [MSGID: 106122]
    [glusterd-mgmt.c:2363:glusterd_mgmt_v3_initiate_snap_phases]
    0-management: Post Validation Failed
    The message "E [MSGID: 106062]
    [glusterd-snapshot-utils.c:2391:glusterd_snap_create_use_rsp_dict] 
0-management:
    failed to get snap UUID" repeated 2 times between [2017-02-15
    13:29:53.779020] and [2017-02-15 13:29:54.535075]
    The message "E [MSGID: 106099]
    [glusterd-snapshot-utils.c:2507:glusterd_snap_use_rsp_dict]
    0-glusterd: Unable to use rsp dict" repeated 2 times between
    [2017-02-15 13:29:53.779073] and [2017-02-15 13:29:54.535078]
    [2017-02-15 13:31:14.285666] I [MSGID: 106488]
    [glusterd-handler.c:1537:__glusterd_handle_cli_get_volume]
    0-management: Received get vol req
    [2017-02-15 13:32:17.827422] E [MSGID: 106027]
    [glusterd-handler.c:4670:glusterd_get_volume_opts] 0-management:
    Volume cluster.locking-scheme does not exist
    [2017-02-15 13:34:02.635762] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre
    Validation failed on s1. Volume data-teste does not exist
    [2017-02-15 13:34:02.635838] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre
    Validation failed on s2. Volume data-teste does not exist
    [2017-02-15 13:34:02.635889] E [MSGID: 106116]
    [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Pre
    Validation failed on s3. Volume data-teste does not exist
    [2017-02-15 13:34:02.636092] E [MSGID: 106122]
    [glusterd-mgmt.c:947:glusterd_mgmt_v3_pre_validate] 0-management:
    Pre Validation failed on peers
    [2017-02-15 13:34:02.636132] E [MSGID: 106122]
    [glusterd-mgmt.c:2009:glusterd_mgmt_v3_initiate_all_phases]
    0-management: Pre Validation Failed
    [2017-02-15 13:34:20.313228] E [MSGID: 106153]
    [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging
    failed on s2. Error: Volume data-teste does not exist
    [2017-02-15 13:34:20.313320] E [MSGID: 106153]
    [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging
    failed on s1. Error: Volume data-teste does not exist
    [2017-02-15 13:34:20.313377] E [MSGID: 106153]
    [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging
    failed on s3. Error: Volume data-teste does not exist
    [2017-02-15 13:34:36.796455] E [MSGID: 106153]
    [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging
    failed on s1. Error: Volume data-teste does not exist
    [2017-02-15 13:34:36.796830] E [MSGID: 106153]
    [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging
    failed on s3. Error: Volume data-teste does not exist
    [2017-02-15 13:34:36.796896] E [MSGID: 106153]
    [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging
    failed on s2. Error: Volume data-teste does not exist

    Many thanks!
     D


    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.gluster.org/mailman/listinfo/gluster-users
    <http://lists.gluster.org/mailman/listinfo/gluster-users>

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Failed snapshot clone leaving undeletable orphaned volume on a single peer

Reply via email to