Re: [Gluster-users] cluster.min-free-disk separate for each, brick

Dan Bretherton Thu, 08 Sep 2011 15:51:30 -0700

On Wed, Sep 7, 2011 at 4:27 PM, Dan Bretherton<[email protected] <mailto:[email protected]>>wrote:



    On 17/08/11 16:19, Dan Bretherton wrote:





            Dan Bretherton wrote:


                On 15/08/11 20:00, [email protected]
                <mailto:[email protected]> wrote:

                    Message: 1
                    Date: Sun, 14 Aug 2011 23:24:46 +0300
                    From: "Deyan Chepishev -
                    SuperHosting.BG"<[email protected]
                    <mailto:[email protected]>>
                    Subject: [Gluster-users] cluster.min-free-disk
                     separate for each
                       brick
                    To: [email protected]
                    <mailto:[email protected]>
                    Message-ID:<[email protected]
                    <mailto:[email protected]>>
                    Content-Type: text/plain; charset=UTF-8; format=flowed

                    Hello,

                    I have a gluster set up with very different brick
                    sizes.

                    brick1: 9T
                    brick2: 9T
                    brick3: 37T

                    with this configuration if I set the parameter
                    cluster.min-free-disk to 10% it
                    applies to all bricks which is quite uncomfortable
                    with these brick sizes,
                    because 10% for the small bricks are ~ 1T but for
                    the big brick it is ~3.7T and
                    what happens at the end is that if all brick go to
                    90% usage and I continue
                    writing, the small ones eventually fill up to 100%
                    while the big one has enough
                    free space.

                    My question is, is there a way to set
                    cluster.min-free-disk per brick instead
                    setting it for the entire volume or any other way
                    to work around this problem ?

                    Thank you in advance

                    Regards,
                    Deyan

                Hello Deyan,

                I have exactly the same problem and I have asked about
                it before - see links below.

                
http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/

                http://gluster.org/pipermail/gluster-users/2011-May/007788.html

                My understanding is that the patch referred to in
                Amar's reply in the May thread prevents a
                "migrate-data" rebalance operation failing by running
                out of space on smaller bricks, but that doesn't solve
                the problem we are having.  Being able to set
                min-free-disk for each brick separately would be
                useful, as would being able to set this value as a
                number of bytes rather than a percentage.  However,
                even if these features were present we would still
                have a problem when the amount of free space becomes
                less than min-free-disk, because this just results in
                a warning message in the logs and doesn't actually
                prevent more files from being written.  In other
                words, min-free-disk is a soft limit rather than a
                hard limit.  When a volume is more than 90% full there
                may still be hundreds of gigabytes of free space
                spread over the large bricks, but the small bricks may
                each only have a few gigabytes left of even less.
                 Users do "df" and see lots of free space in the
                volume so they continue writing files.  However, when
                GlusterFS chooses to write a file to a small brick,
                the write fails with "device full" errors if the file
                grows too large, which is often the case here with
                files typically several gigabytes in size for some
                applications.

                I would really like to know if there is a way to make
                min-free-disk a hard limit.  Ideally, GlusterFS would
                chose a brick on which to write a file based on how
                much free space it has left rather than choosing a
                brick at random (or however it is done now).  That
                would solve the problem of non-uniform brick sizes
                without the need for a hard min-free-disk limit.

                Amar's comment in the May thread about QA testing
                being done only on volumes with uniform brick sizes
                prompted me to start standardising on a uniform brick
                size for each volume in my cluster.  My impression is
                that implementing the features needed for users with
                non-uniform brick sizes is not a priority for Gluster,
                and that users are all expected to use uniform brick
                sizes.  I really think this fact should be stated
                clearly in the GlusterFS documentation, in the
                sections on creating volumes in the Administration
                Guide for example.  That would stop other users from
                going down the path that I did initially, which has
                given me a real headache because I am now having to
                move tens of terabytes of data off bricks that are
                larger than the new standard size.

                Regards
                Dan.

            Hello,

            This is really bad news, because I already migrated my
            data and I just realized that I am screwed because Gluster
            just does not care about the brick sizes.
            It is impossible to move to uniform brick sizes.

            Currently we use 2TB  HDDs, but the disks are growing and
            soon we will probably use 3TB hdds or whatever other
            larges sizes appear on the market. So if we choose to use
            raid5 and some level of redundancy (for example 6hdds in
            raid5, no matter what their size is) this sooner or later
            will lead us to non uniform bricks which is a problem and
            it is not correct to expect that we always can or want to
            provide uniform size bricks.

            With this way of thinking if we currently have 10T from
            6x2T in hdd5, at some point when there is a 10T on a
            single disk we will have to use no raid just because
            gluster can not handle non uniform bricks.

            Regards,
            Deyan


        I think Amar might have provided the answer in his posting to
        the thread yesterday, which has just appeared in my autospam
        folder.

        http://gluster.org/pipermail/gluster-users/2011-August/008579.html

            With size option, you can have a hardbound on min-free-disk

        This means that you can set a hard limit on min-free-disk, and
        set a value in GB that is bigger than the biggest file that is
        ever likely to be written.  This looks likely to solve our
        problem and make non-uniform brick sizes a practical
        proposition.  I wish I had known about this back in May when I
        embarked on my cluster restructuring exercise; the issue was
        discussed in this thread in May as well:
        http://gluster.org/pipermail/gluster-users/2011-May/007794.html

        Once I have moved all the data off the large bricks and
        standardised on a uniform brick size, it will be relatively
        easy to stick to this because I use LVM.  I create logical
        volumes for new bricks when a volume needs extending.  The
        only problem with this approach is what happens when the
        amount of free space left on a server is less than the size of
        the brick you want to create.  The only option then would be
        to use new servers, potentially wasting several TB of free
        space on existing servers.  The standard brick size for most
        of my volumes is 3TB, which allows me to use a mixture of
        small servers and large servers in a volume and limits the
        amount of free space that would be wasted if there wasn't
        quite enough free space on a server to create another brick.
         Another consequence of having 3TB bricks is that a single
        server typically has two more more bricks belonging to a the
        same volume, although I do my best to distribute the volumes
        across different servers in order to spread the load.  I am
        not aware of any problems associated with exporting multiple
        bricks from a single server and it has not caused me any
        problems so far that I am aware of.

        -Dan.

    Hello Deyan,

    Have you tried giving min-free-disk a value in gigabytes, and if
    so does it prevent new files being written to your bricks when
    they are nearly full?  I recently tried it myself and found that
    min-free-disk had no effect all.  I deliberately filled my
    test/backup volume and most of the bricks became 100 full.  I set
    min-free-disk to "20GB", as reported in "gluster volume ... info"
    below.

    cluster.min-free-disk: 20GB

    Unless I am doing something wrong it seems as though we can not
    "have a hardbound on min-free-disk" after all, and uniform brick
    size is therefore an essential requirement.  It still doesn't say
    that in the documentation, at least not in the volume creation
    sections.


    -Dan.

On 08/09/11 06:35, Raghavendra Bhat wrote:
> This is how it is supposed to work.
>

> Suppose a distribute volume is created with 2 bricks. 1st brick ishaving 25GB of free space, 2nd disk has 35 GB of free space. If onesets a 30GB of minimum-free-disk through volume set (gluster volumeset <volname> min-free-disk 30GB), then whenever files are created, ifthe file is hashed to the 1st brick (which has 25GB of free space),then actual file will be created in the 2nd brick to which a linkfilewill be created in the 1st brick. So the linkfile points to the actualfile. A warning message indicating minimum free disk limit has beencrosses and adding more nodes will be printed in the glusterfs logfile. So any file which is hashed to the 1st brick will be created inthe 2nd brick.

> Once the free space of 2nd brick also comes below 30 GB, then thefiles will be created in the respective hashed bricks only. There willbe a warning message in the log file about the 2nd brick also crossingthe minimum free disk limit.

>
> Regards,
> Raghavendra Bhat

Dear Raghavendra,

Thanks for explaining this to me. This mechanism should allow a volumeto function correctly with non-uniform brick sizes even thoughmin-free-disk is not a hard limit. I can understand now why I had somany problems with the default value of 10% for min-free-disk. 10% of alarge brick can be very large compared to 10% of a small brick, so whenthey started filling up at the same rate after all had less than 10%free space the small bricks usually filled up long before large ones,giving "device full" errors even when df still showed a lot of freespace in the volume. At least now we can minimise this effect bysetting min-free-disk to a value in GB.


-Dan.

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] cluster.min-free-disk separate for each, brick

Reply via email to