Re: [Gluster-users] cluster.min-free-disk separate for each, brick

Dan Bretherton Wed, 07 Sep 2011 03:57:22 -0700


On 17/08/11 16:19, Dan Bretherton wrote:

Dan Bretherton wrote:
On 15/08/11 20:00, [email protected] wrote:
Message: 1
Date: Sun, 14 Aug 2011 23:24:46 +0300
From: "Deyan Chepishev - SuperHosting.BG"<[email protected]>
Subject: [Gluster-users] cluster.min-free-disk  separate for each
    brick
To: [email protected]
Message-ID:<[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hello,

I have a gluster set up with very different brick sizes.

brick1: 9T
brick2: 9T
brick3: 37T
with this configuration if I set the parametercluster.min-free-disk to 10% itapplies to all bricks which is quite uncomfortable with these bricksizes,because 10% for the small bricks are ~ 1T but for the big brick itis ~3.7T andwhat happens at the end is that if all brick go to 90% usage and Icontinuewriting, the small ones eventually fill up to 100% while the bigone has enough
free space.
My question is, is there a way to set cluster.min-free-disk perbrick insteadsetting it for the entire volume or any other way to work aroundthis problem ?
Thank you in advance

Regards,
Deyan
Hello Deyan,
I have exactly the same problem and I have asked about it before -see links below.
http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/
http://gluster.org/pipermail/gluster-users/2011-May/007788.html
My understanding is that the patch referred to in Amar's reply inthe May thread prevents a "migrate-data" rebalance operation failingby running out of space on smaller bricks, but that doesn't solvethe problem we are having. Being able to set min-free-disk for eachbrick separately would be useful, as would being able to set thisvalue as a number of bytes rather than a percentage. However, evenif these features were present we would still have a problem whenthe amount of free space becomes less than min-free-disk, becausethis just results in a warning message in the logs and doesn'tactually prevent more files from being written. In other words,min-free-disk is a soft limit rather than a hard limit. When avolume is more than 90% full there may still be hundreds ofgigabytes of free space spread over the large bricks, but the smallbricks may each only have a few gigabytes left of even less. Usersdo "df" and see lots of free space in the volume so they continuewriting files. However, when GlusterFS chooses to write a file to asmall brick, the write fails with "device full" errors if the filegrows too large, which is often the case here with files typicallyseveral gigabytes in size for some applications.
I would really like to know if there is a way to make min-free-diska hard limit. Ideally, GlusterFS would chose a brick on which towrite a file based on how much free space it has left rather thanchoosing a brick at random (or however it is done now). That wouldsolve the problem of non-uniform brick sizes without the need for ahard min-free-disk limit.
Amar's comment in the May thread about QA testing being done only onvolumes with uniform brick sizes prompted me to start standardisingon a uniform brick size for each volume in my cluster. Myimpression is that implementing the features needed for users withnon-uniform brick sizes is not a priority for Gluster, and thatusers are all expected to use uniform brick sizes. I really thinkthis fact should be stated clearly in the GlusterFS documentation,in the sections on creating volumes in the Administration Guide forexample. That would stop other users from going down the path thatI did initially, which has given me a real headache because I am nowhaving to move tens of terabytes of data off bricks that are largerthan the new standard size.
Regards
Dan.
Hello,
This is really bad news, because I already migrated my data and Ijust realized that I am screwed because Gluster just does not careabout the brick sizes.
It is impossible to move to uniform brick sizes.
Currently we use 2TB HDDs, but the disks are growing and soon wewill probably use 3TB hdds or whatever other larges sizes appear onthe market. So if we choose to use raid5 and some level of redundancy(for example 6hdds in raid5, no matter what their size is) thissooner or later will lead us to non uniform bricks which is a problemand it is not correct to expect that we always can or want to provideuniform size bricks.
With this way of thinking if we currently have 10T from 6x2T in hdd5,at some point when there is a 10T on a single disk we will have touse no raid just because gluster can not handle non uniform bricks.
Regards,
Deyan
I think Amar might have provided the answer in his posting to thethread yesterday, which has just appeared in my autospam folder.
http://gluster.org/pipermail/gluster-users/2011-August/008579.html
With size option, you can have a hardbound on min-free-disk
This means that you can set a hard limit on min-free-disk, and set avalue in GB that is bigger than the biggest file that is ever likelyto be written. This looks likely to solve our problem and makenon-uniform brick sizes a practical proposition. I wish I had knownabout this back in May when I embarked on my cluster restructuringexercise; the issue was discussed in this thread in May as well:http://gluster.org/pipermail/gluster-users/2011-May/007794.html
Once I have moved all the data off the large bricks and standardisedon a uniform brick size, it will be relatively easy to stick to thisbecause I use LVM. I create logical volumes for new bricks when avolume needs extending. The only problem with this approach is whathappens when the amount of free space left on a server is less thanthe size of the brick you want to create. The only option then wouldbe to use new servers, potentially wasting several TB of free space onexisting servers. The standard brick size for most of my volumes is3TB, which allows me to use a mixture of small servers and largeservers in a volume and limits the amount of free space that would bewasted if there wasn't quite enough free space on a server to createanother brick. Another consequence of having 3TB bricks is that asingle server typically has two more more bricks belonging to a thesame volume, although I do my best to distribute the volumes acrossdifferent servers in order to spread the load. I am not aware of anyproblems associated with exporting multiple bricks from a singleserver and it has not caused me any problems so far that I am aware of.
-Dan.

Hello Deyan,

Have you tried giving min-free-disk a value in gigabytes, and if so doesit prevent new files being written to your bricks when they are nearlyfull? I recently tried it myself and found that min-free-disk had noeffect all. I deliberately filled my test/backup volume and most of thebricks became 100 full. I set min-free-disk to "20GB", as reported in"gluster volume ... info" below.


cluster.min-free-disk: 20GB

Unless I am doing something wrong it seems as though we can not "have ahardbound on min-free-disk" after all, and uniform brick size istherefore an essential requirement. It still doesn't say that in thedocumentation, at least not in the volume creation sections.


-Dan.

--
Mr. D.A. Bretherton
Computer System Manager
Environmental Systems Science Centre
Harry Pitt Building
3 Earley Gate
University of Reading
Reading, RG6 6AL
UK

Tel. +44 118 378 5205
Fax: +44 118 378 6413


_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] cluster.min-free-disk separate for each, brick

Reply via email to