Re: [Gluster-users] self-heal failed

Daniel Taylor Thu, 10 Jan 2013 09:19:03 -0800

I've run replace-brick on missing bricks before, it should still work.


On the other hand, data corruption is the worst case failure mode.

The one time I hit data corruption on a node my final answer ended upbeing to rebuild the cluster from scratch and restore the best copy ofthe data I had (mix of backups and live data).


On 01/10/2013 11:12 AM, Liang Ma wrote:


Thank you Daniel for you more comments.

Now I can remove the damaged zfs brick after rebooting the system. Butthen what can I do to rejoin a new brick? I can't run gluster volumereplace-brick because the old brick is gone. I can't even remove theold brick because the gluster's replicate count is 2. So what is theright procedure to replace a failed brick for replicate gluster volume?


Liang

On Thu, Jan 10, 2013 at 11:57 AM, Daniel Taylor <[email protected]<mailto:[email protected]>> wrote:


    I'm not familiar with zfs in particular, but it should have given
    you a message saying why it won't unmount.

    In the worst case you can indeed remove the mount point from
    /etc/fstab and reboot. A hard reboot may be necessary in a case
    like this.


    On 01/10/2013 10:43 AM, Liang Ma wrote:


        Yes, I stopped the glusterfs service on the damaged system but
        zfs still won't allow me to umount the filesystem. Maybe I
        should try to shutdown the entire system.


        On Wed, Jan 9, 2013 at 10:28 AM, Daniel Taylor
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>>
        wrote:


            On 01/09/2013 08:31 AM, Liang Ma wrote:


                Hi Daniel,

                Ok, if gluster can't self-heal from this situation, I
        hope at
                least I can manually restore the volume by using the good
                brick available. So would you please tell me how can I
        "simply
                rebuild the filesystem and let gluster attempt to
        restore it
                from a *clean* filesystem"?


            Trimmed for space.

            You could do as Tom Pfaff suggests, but given the odds of data
            corruption carrying forward I'd do the following:
            Shut down gluster on the damaged system.
            Unmount the damaged filesystem.
            Reformat the damaged filesystem as new (throwing away any
            potential corruption that might not get caught on rebuild)
            Mount the new filesystem at the original mount point
            Restart gluster

            In the event of corruption due to hardware failure you'd
        be doing
            this on replacement hardware.
            The key is you have to have a functional filesystem for
        gluster to
            work with.


            --     Daniel Taylor             VP Operations Vocal
        Laboratories, Inc
        [email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>
        612-235-5711 <tel:612-235-5711>
            <tel:612-235-5711 <tel:612-235-5711>>

            _______________________________________________
            Gluster-users mailing list
        [email protected] <mailto:[email protected]>
        <mailto:[email protected]
        <mailto:[email protected]>>
        http://supercolony.gluster.org/mailman/listinfo/gluster-users

--Daniel Taylor VP Operations Vocal Laboratories, Inc

    [email protected] <mailto:[email protected]> 612-235-5711
    <tel:612-235-5711>

    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://supercolony.gluster.org/mailman/listinfo/gluster-users


--
Daniel Taylor             VP Operations       Vocal Laboratories, Inc
[email protected]                                     612-235-5711

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] self-heal failed

Reply via email to