Re: [Gluster-users] Backups

Alvin Starr Thu, 23 Mar 2017 14:53:22 -0700

That's true and it can last much longer than days.

I have a client that has some data-sets that take months to copy and arenot the biggest data user in the world.

The biggest problems with backups is that some day you may need torestore them.




On 03/23/2017 04:29 PM, Gandalf Corvotempesta wrote:

Yes but the biggest issue is how to recover

You'll need to recover the whole storage not a single snapshot andthis can last for days

Il 23 mar 2017 9:24 PM, "Alvin Starr" <[email protected]<mailto:[email protected]>> ha scritto:


    For volume backups you need something like snapshots.

    If you take a snapshot A of a live volume L that snapshot stays at
    that moment in time and you can rsync that to another system or
    use something like deltacp.pl <http://deltacp.pl> to copy it.

    The usual process is to delete the snapshot once its copied and
    than repeat the process again when the next backup is required.

    That process does require rsync/deltacp to read the complete
    volume on both systems which can take a long time.

    I was kicking around the idea to try and handle snapshot deltas
    better.

    The idea is that you could take your initial snapshot A then sync
    that snapshot to your backup system.

    At a later point you could take another snapshot B.

    Because snapshots contain the copies of the original data at the
    time of the snapshot and unmodified data points to the Live volume
    it is possible to tell what blocks of data have changed since the
    snapshot was taken.

    Now that you have a second snapshot you can in essence perform a
    diff on the A and B snapshots to get only the blocks that changed
    up to the time that B was taken.

    These blocks could be copied to the backup image and you should
    have a clone of the B snapshot.

    You would not have to read the whole volume image but just the
    changed blocks dramatically improving the speed of the backup.

    At this point you can delete the A snapshot and promote the B
    snapshot to be the A snapshot for the next backup round.


    On 03/23/2017 03:53 PM, Gandalf Corvotempesta wrote:

    Are backup consistent?
    What happens if the header on shard0 is synced referring to some
    data on shard450 and when rsync parse shard450 this data is
    changed by subsequent writes?

    Header would be backupped  of sync respect the rest of the image

    Il 23 mar 2017 8:48 PM, "Joe Julian" <[email protected]
    <mailto:[email protected]>> ha scritto:

        The rsync protocol only passes blocks that have actually
        changed. Raw changes fewer bits. You're right, though, that
        it still has to check the entire file for those changes.


        On 03/23/17 12:47, Gandalf Corvotempesta wrote:

        Raw or qcow doesn't change anything about the backup.
        Georep always have to sync the whole file

        Additionally, raw images has much less features than qcow

        Il 23 mar 2017 8:40 PM, "Joe Julian" <[email protected]
        <mailto:[email protected]>> ha scritto:

            I always use raw images. And yes, sharding would also be
            good.


            On 03/23/17 12:36, Gandalf Corvotempesta wrote:

            Georep expose to another problem:
            When using gluster as storage for VM, the VM file is
            saved as qcow. Changes are inside the qcow, thus rsync
            has to sync the whole file every time

            A little workaround would be sharding, as rsync has to
            sync only the changed shards, but I don't think this is
            a good solution

            Il 23 mar 2017 8:33 PM, "Joe Julian"
            <[email protected] <mailto:[email protected]>> ha
            scritto:

                In many cases, a full backup set is just not
                feasible. Georep to the same or different DC may be
                an option if the bandwidth can keep up with the
                change set. If not, maybe breaking the data up into
                smaller more manageable volumes where you only keep
                a smaller set of critical data and just back that
                up. Perhaps an object store (swift?) might handle
                fault tolerance distribution better for some workloads.

                There's no one right answer.


                On 03/23/17 12:23, Gandalf Corvotempesta wrote:

                Backing up from inside each VM doesn't solve the
                problem
                If you have to backup 500VMs you just need more
                than 1 day and what if you have to restore the
                whole gluster storage?

                How many days do you need to restore 1PB?

                Probably the only solution should be a georep in
                the same datacenter/rack with a similiar cluster,
                ready to became the master storage.
                In this case you don't need to restore anything as
                data are already there,
                only a little bit back in time but this double the TCO

                Il 23 mar 2017 6:39 PM, "Serkan Çoban"
                <[email protected]
                <mailto:[email protected]>> ha scritto:

                    Assuming a backup window of 12 hours, you need
                    to send data at 25GB/s
                    to backup solution.
                    Using 10G Ethernet on hosts you need at least
                    25 host to handle 25GB/s.
                    You can create an EC gluster cluster that can
                    handle this rates, or
                    you just backup valuable data from inside VMs
                    using open source backup
                    tools like borg,attic,restic , etc...

                    On Thu, Mar 23, 2017 at 7:48 PM, Gandalf
                    Corvotempesta
                    <[email protected]
                    <mailto:[email protected]>> wrote:
                    > Let's assume a 1PB storage full of VMs
                    images with each brick over ZFS,
                    > replica 3, sharding enabled
                    >
                    > How do you backup/restore that amount of data?
                    >
                    > Backing up daily is impossible, you'll never
                    finish the backup that the
                    > following one is starting (in other words,
                    you need more than 24 hours)
                    >
                    > Restoring is even worse. You need more than
                    24 hours with the whole cluster
                    > down
                    >
                    > You can't rely on ZFS snapshot due to
                    sharding (the snapshot took from one
                    > node is useless without all other node
                    related at the same shard) and you
                    > still have the same restore speed
                    >
                    > How do you backup this?
                    >
                    > Even georep isn't enough, if you have to
                    restore the whole storage in case
                    > of disaster
                    >
                    > _______________________________________________
                    > Gluster-users mailing list
                    > [email protected]
                    <mailto:[email protected]>
                    >
                    http://lists.gluster.org/mailman/listinfo/gluster-users
                    <http://lists.gluster.org/mailman/listinfo/gluster-users>



                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://lists.gluster.org/mailman/listinfo/gluster-users
                <http://lists.gluster.org/mailman/listinfo/gluster-users>

                _______________________________________________
                Gluster-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://lists.gluster.org/mailman/listinfo/gluster-users
                <http://lists.gluster.org/mailman/listinfo/gluster-users>

    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.gluster.org/mailman/listinfo/gluster-users
    <http://lists.gluster.org/mailman/listinfo/gluster-users>

--Alvin Starr || voice:(905)513-7688 <tel:%28905%29%20513-7688>

    Netvel Inc.                   ||   Cell:(416)806-0133 
<tel:%28416%29%20806-0133>
    [email protected] <mailto:[email protected]>               ||

    _______________________________________________ Gluster-users
    mailing list [email protected]
    <mailto:[email protected]>
    http://lists.gluster.org/mailman/listinfo/gluster-users

<http://lists.gluster.org/mailman/listinfo/gluster-users>

--
Alvin Starr                   ||   voice: (905)513-7688
Netvel Inc.                   ||   Cell:  (416)806-0133
[email protected]              ||

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Backups

Reply via email to