Re: [Bacula-users] Fix documentation on deduplication

Martin Simmons Wed, 24 Apr 2024 07:49:35 -0700

>>>>> On Wed, 24 Apr 2024 23:40:31 +1000, Gary R Schmidt said:
> 
> On 24/04/2024 22:33, Gary R. Schmidt wrote:
> > On 24/04/2024 21:30, Roberto Greiner wrote:
> >>
> >> Em 24/04/2024 04:30, Radosław Korzeniewski escreveu:
> >>> Hello,
> >>>
> >>> wt., 23 kwi 2024 o 13:33 Roberto Greiner <mrgrei...@gmail.com> 
> >>> napisał(a):
> >>>
> >>>
> >>>     Em 23/04/2024 04:34, Radosław Korzeniewski escreveu:
> >>>>     Hello,
> >>>>
> >>>>     śr., 17 kwi 2024 o 14:01 Roberto Greiner <mrgrei...@gmail.com>
> >>>>     napisał(a):
> >>>>
> >>>>
> >>>>         The error is at the end of the page, where it says that you
> >>>>         can see how
> >>>>         much space is being used using 'df -h', but the problem is
> >>>>         that df can't
> >>>>         actually see the space gain from dedup, it shows how much
> >>>>         would be used
> >>>>         without dedup.
> >>>>
> >>>>
> >>>>     This command (df -h) shows how much allocated and free space is
> >>>>     available on the filesystem. So when you have a dedup ratio 20:1,
> >>>>     and you wrote 20TB, then your df command shows 1TB allocated.
> >>>
> >>>     But that is the exact problem I had. df did NOT show 1TB
> >>>     allocated. It indicated 20TB allocated (yes, in ZFS).
> >>>
> >>> I have not used ZFS Dedup for a long time (I'm a ZFS user from the 
> >>> first beta in Solaris), so I'm curious - if your zpool is 2TB in size 
> >>> and you have a 20:1 dedup ratio with 20TB saved and 1TB allocated 
> >>> then what df shows for you?
> >>> Something like this?
> >>> Size: 2TB
> >>> Used: 20TB
> >>> Avail: 1TB
> >>> Use%: 2000%
> >>>
> >> No, the values are quite different. I wrote 20tb to stay with the 
> >> example previously given. My actual numbers are:
> >>
> >> df: 2,9TB used
> >> zpool list: 862GB used, 3.4x dedup level.
> >> Actual partition size: 7.2TB
> >>
> > You use zpool list to examine filespace.
> > Or zfs list.


On FreeBSD at least, zfs list will show the same as df (i.e. will include all
copies of the deduplicated data in the USED column).

I think the reason is that deduplication is done at the pool level, so there
is no single definition of which dataset owns each deduplicated block.  As a
result, the duplicates have to be counted multiple times.  This is different
from a cloned dataset, where the original dataset owns any blocks that are
shared.

__Martin


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Fix documentation on deduplication

Reply via email to