Hello,

1) RBD is sparsely allocated, meaning that creating a 10TB volume would
initially take up close to no space at all. 

2) It is also completely file system agnostic (as the B for block
indicates, a misconception that creeps up with DRBD all the time, too).

That means that when you write 8TB in your test, those get allocated. 
And of course they get allocated as many times as your replication factor
is set. By default that would be 2 and 16TB plus whatever overhead Ceph
brings along. This matches your numbers. Look at the "ceph -s" output for
data in contrast to used space. 
Now when you delete anything in your image, you need to remember 2) up
there. 
Depending on how your file system allocated blocks (BTRFS would be the
worst here) previously allocated "blocks" would be re-used, but eventually
you will wind up using all the space (multiplied by the replication
setting).

To free up that space you need to issue a TRIM/DISCARD command, which is
not possible for the kernelspace RBD module, see the thread called
"question on harvesting freed space" just 2 weeks ago.
It's possible from the librbd with VMs, but if you read that thread you
will see that it also has some gotchas. 

The sparse allocation is nice, but in the end you need to provide the disk
space you're expecting to use at a BLOCK device level.

Of course you could just delete that image and all space would be returned.

Regards,

Christian
On Mon, 28 Apr 2014 09:41:31 -0400 Alphe Salas wrote:

> Hello,
> I need rbd kernel module to really delete data on osd related disks. 
> Having a ever growing "hidden data" is not a great solution.
> 
> Then we can say that first of all we should be able at least manually to 
> strip out the "hidden" data aka the replicas.
> 
> I use rbd image let say it is 10 TB on  a overall available space of 
> 25TB. What the real case experience shows me if that if I write in a row 
> 8Tb of my 10 tb. then overall used data is around 18TB. Then I delete 
> from the rbd image 4TB and write 4 TB then the overall data would grow 
> from 4 TB, ofcours the pgs used by the rbd image will be reused 
> overwritten but the replicas corresponding will not so.
> in the end after round 2 of writing the overall used space is 22TB
> at that moment i get stuff like this:
> 
>               2034 active+clean
>                     7 active+remapped+wait_backfill+backfill_toofull
>                     7 active+remapped+backfilling
> 
> I tried to use ceph osd reweight-by-utilization but that  didn t solve 
> the problem. And if the problem is solve it would be only momentarily 
> because after cleaning again 4TB and writing 4TB then I will reach the 
> full ratio and get my osd stucked until I spend 12 000 dollars to 
> enhance my ceph cluster. Because when you manipulate a 40TB ceph cluster
> adding 4TB isn t quite mutch of a difference.
> 
> In the end for 40TB of real space 20 disks of 2TB after first formating
> I get a 37 TB cluster of available data. Then I do a 18TB rbd image. And
> can t use much than 16TB before having my osds showing page stucks.
> 
> In the end 37TB for a 16TB of available disk space for sometimes is 
> quite not the great solution at all because I loose 60% of my data 
> storage.
> 
> On the how to delete data, really I don't know the more "easy" way
> I can see is at least to be able to manually tell rbd kernel module to
> clean "released" data from osd when we see it fit "maintenance time".
> 
> If doing it automatically has a too bad impact on overall performances.
> I would be glad yet to be able to decide an appropriate moment to force
> cleaning task that would be better than nothing and ever growing "hiden" 
> data situation.
> 
> Regards,
> 


-- 
Christian Balzer        Network/Systems Engineer                
[email protected]           Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to