Re: [Gluster-users] Fwd: Added bricks with wrong name and now need to remove them without destroying volume.

Tami Greene Wed, 27 Feb 2019 13:56:34 -0800

That makes sense.  System is made of four data arrays with a hardware RAID
6 and then the distributed volume on top.  I honestly don't know how that
works, but the previous administrator said we had redundancy.  I'm hoping
there is a way to bypass the safeguard of migrating data when removing a
brick from the volume, which in my beginner's mind, would be a
straight-forward way of remedying the problem.  Hopefully once the empty
bricks are removed, the "missing" data will be visible again in the volume.


On Wed, Feb 27, 2019 at 3:59 PM Jim Kinney <[email protected]> wrote:

> Keep in mind that gluster is a metadata process. It doesn't really touch
> the actual volume files. The exception is the .glusterfs and .trashcan
> folders in the very top directory of the gluster volume.
>
> When you create a gluster volume from brick, it doesn't format the
> filesystem. It uses what's already there.
>
> So if you remove a volume and all it's bricks, you've not deleted data.
>
> That said, if you are using anything but replicated bricks, which is what
> I use exclusively for my needs, then reassembling them into a new volume
> with correct name might be tricky. By listing the bricks in the exact same
> order as they were listed when creating the wrong name volume when making
> the correct named volume, it should use the same method to put data on the
> drives as previously and not scramble anything.
>
> On Wed, 2019-02-27 at 14:24 -0500, Tami Greene wrote:
>
> I sent this and realized I hadn't registered.  My apologies for the
> duplication
>
> Subject: Added bricks with wrong name and now need to remove them without
> destroying volume.
> To: <[email protected]>
>
>
>
> Yes, I broke it. Now I need help fixing it.
>
>
>
> I have an existing Gluster Volume, spread over 16 bricks and 4 servers;
> 1.5P space with 49% currently used .  Added an additional 4 bricks and
> server as we expect large influx of data in the next 4 to 6 months.  The
> system had been established by my predecessor, who is no longer here.
>
>
>
> First solo addition of bricks to gluster.
>
>
>
> Everything went smoothly until “gluster volume add-brick Volume
> newserver:/bricks/dataX/vol.name"
>
>                 (I don’t have the exact response as I worked on this for
> almost 5 hours last night) Unable to add-brick as “it is already mounted”
> or something to that affect.
>
>                 Double checked my instructions, the name of the bricks.
> Everything seemed correct.  Tried to add again adding “force.”  Again,
> “unable to add-brick”
>
>                 Because of the keyword (in my mind) “mounted” in the
> error, I checked /etc/fstab, where the name of the mount point is simply
> /bricks/dataX.
>
> This convention was the same across all servers, so I thought I had
> discovered an error in my notes and changed the name to
> newserver:/bricks/dataX.
>
> Still had to use force, but the bricks were added.
>
> Restarted the gluster volume vol.name. No errors.
>
> Rebooted; but /vol.name did not mount on reboot as the /etc/fstab
> instructs. So I attempted to mount manually and discovered a had a big mess
> on my hands.
>
>                                 “Transport endpoint not connected” in
> addition to other messages.
>
>                 Discovered an issue between certificates and the
> auth.ssl-allow list because of the hostname of new server.  I made
> correction and /vol.name mounted.
>
>                 However, df -h indicated the 4 new bricks were not being
> seen as 400T were missing from what should have been available.
>
>
>
> Thankfully, I could add something to vol.name on one machine and see it
> on another machine and I wrongly assumed the volume was operational, even
> if the new bricks were not recognized.
>
> So I tried to correct the main issue by,
>
>                 gluster volume remove vol.name newserver/bricks/dataX/
>
>                 received prompt, data will be migrated before brick is
> removed continue (or something to that) and I started the process, think
> this won’t take long because there is no data.
>
>                 After 10 minutes and no apparent progress on the process,
> I did panic, thinking worse case scenario – it is writing zeros over my
> data.
>
>                 Executed the stop command and there was still no progress,
> and I assume it was due to no data on the brick to be remove causing the
> program to hang.
>
>                 Found the process ID and killed it.
>
>
> This morning, while all clients and servers can access /vol.name; not all
> of the data is present.  I can find it under cluster, but users cannot
> reach it.  I am, again, assume it is because of the 4 bricks that have been
> added, but aren't really a part of the volume because of their incorrect
> name.
>
>
>
> So – how do I proceed from here.
>
>
> 1. Remove the 4 empty bricks from the volume without damaging data.
>
> 2. Correctly clear any metadata about these 4 bricks ONLY so they may be
> added correctly.
>
>
> If this doesn't restore the volume to full functionality, I'll write
> another post if I cannot find answer in the notes or on line.
>
>
> Tami--
>
>
> _______________________________________________
>
> Gluster-users mailing list
>
> [email protected]
>
>
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
>
> James P. Kinney III Every time you stop a school, you will have to build a
> jail. What you gain at one end you lose at the other. It's like feeding a
> dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark
> Twain http://heretothereideas.blogspot.com/
>
>

-- 
Tami

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: Added bricks with wrong name and now need to remove them without destroying volume.

Reply via email to