Hi,

On 02/10/10 02:29 AM, Darren Mackay wrote:
> Item :  Support for file-system / volume / san dedup for file devices
>
> Date:   10 Feb 2010
>
> Origin: Darren Mackay (Velitium)
>
> Status:
>
> What:   File devices should provide support for block based
> deduplication provided by the underlying file-systems / volume manager /
> san.

I disagree. Dedup should be transparent to the applications using dedup 
enabled storage, much in the same way that the actual RAID 
implementation is to the storage daemon.

> Why:    A number of file-systems / volume managers / sans now provide
> block based deduplication. For block level dedup, it is not uncommon for
> deduplication ratios to be to be 3x, 4x, or 5x for unstructured data.
>
> Currently it appears (forgive me and advise if this is actually
> incorrect, as this is drawn upon a number of forum posts) that that
> bacula storage daemon is packing the data-stream back-2-back, which
> prevents block based duplication as the data-stream is not aligned to
> blocks as defined by the underlying storage device. I have also read
> several posts that indicate that bacula may multiplex data streams,
> which in the case of underlying dedup, would further prevent dedup from
> be performed.
>
> Allowing for dedup in the underlying file-system / volume / san would
> also alleviate the need for sysadmins to tune baselines between
> different hosts which use the same storage daemon file device(s).
>
> Notes:
>
> Based on limited testing, some dedup is able be performed, but the
> number of duplicate blocks detected is limited. For instance,
> consecutive full backs from a single client machine (approx 200GB, both
> o/s and unstructured file data) for only a single concurrent job should
> have resulted in a significant portion of the backup to be detected as
> duplicate blocks by the underlying storage (OpenSolaris ZFS in this
> case), however, the actual ration of dedup detected for the 2nd full
> backup was approx 70k blocks (~ 8.5GB). Subsequent runs of the full
> backup yielded similar results. Allowing for metadata, I would have
> expected at least 80% of the full backup to dedup.
>
> Several levels of dedup support, which could be implemented in a staged
> approached.
>
> Phase 1 - File device dedup support
> - This would allow for dedup between file devices on the same system)
> - Add padding at the end of each file to a user configurable block size.
>
>     DedupBlockSize = 8k (configurable, in bytes)
>
> - If the configuration options is missing, then disable all support for
> underlying dedup for file devices.
>
> Phase 2 - Autodetection of dedup supported file-systems
> - When dedup is provided by the host o/s of the file system device, the
> storage daemon should detect if dedup is enabled for the file device
> location. For Solaris / Opensolaris ZFS, this value is available through
> the filesystem extended properties. In this case, if dedup is enabled
> for the ZFS filesystem, the storage daemon should read the filesystem
> block size as use this value. (note - ZFS also uses variable block
> sizes, and thus will only allocate the require size if the requirement
> is less than the actual block size)

The storage daemon uses a default block size of 64k. Was your ZFS fs 
tuned to that blocksize ?

You can configure the blocksize used by the storage daemon using the 
"maximum / minimum block size" parameters which, according to the 
manual, will add padding to a block to allow blocksize alignment.

Keeping one job per volume did increase our dedup ratio during testing.

> Phase 3 - Alignment of the datastream to underlying file-system blocks
> and separate of bacula metadata to separate blocks
> - This would allow for underlying storage system deduplication between
> both bacula file devices and real data stored elsewhere on the
> file-system / volume / san.
>


-- 
Med venlig hilsen / Best Regards

Henrik Johansen
[email protected]
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to