Hi, On 02/10/10 02:29 AM, Darren Mackay wrote: > Item : Support for file-system / volume / san dedup for file devices > > Date: 10 Feb 2010 > > Origin: Darren Mackay (Velitium) > > Status: > > What: File devices should provide support for block based > deduplication provided by the underlying file-systems / volume manager / > san.
I disagree. Dedup should be transparent to the applications using dedup enabled storage, much in the same way that the actual RAID implementation is to the storage daemon. > Why: A number of file-systems / volume managers / sans now provide > block based deduplication. For block level dedup, it is not uncommon for > deduplication ratios to be to be 3x, 4x, or 5x for unstructured data. > > Currently it appears (forgive me and advise if this is actually > incorrect, as this is drawn upon a number of forum posts) that that > bacula storage daemon is packing the data-stream back-2-back, which > prevents block based duplication as the data-stream is not aligned to > blocks as defined by the underlying storage device. I have also read > several posts that indicate that bacula may multiplex data streams, > which in the case of underlying dedup, would further prevent dedup from > be performed. > > Allowing for dedup in the underlying file-system / volume / san would > also alleviate the need for sysadmins to tune baselines between > different hosts which use the same storage daemon file device(s). > > Notes: > > Based on limited testing, some dedup is able be performed, but the > number of duplicate blocks detected is limited. For instance, > consecutive full backs from a single client machine (approx 200GB, both > o/s and unstructured file data) for only a single concurrent job should > have resulted in a significant portion of the backup to be detected as > duplicate blocks by the underlying storage (OpenSolaris ZFS in this > case), however, the actual ration of dedup detected for the 2nd full > backup was approx 70k blocks (~ 8.5GB). Subsequent runs of the full > backup yielded similar results. Allowing for metadata, I would have > expected at least 80% of the full backup to dedup. > > Several levels of dedup support, which could be implemented in a staged > approached. > > Phase 1 - File device dedup support > - This would allow for dedup between file devices on the same system) > - Add padding at the end of each file to a user configurable block size. > > DedupBlockSize = 8k (configurable, in bytes) > > - If the configuration options is missing, then disable all support for > underlying dedup for file devices. > > Phase 2 - Autodetection of dedup supported file-systems > - When dedup is provided by the host o/s of the file system device, the > storage daemon should detect if dedup is enabled for the file device > location. For Solaris / Opensolaris ZFS, this value is available through > the filesystem extended properties. In this case, if dedup is enabled > for the ZFS filesystem, the storage daemon should read the filesystem > block size as use this value. (note - ZFS also uses variable block > sizes, and thus will only allocate the require size if the requirement > is less than the actual block size) The storage daemon uses a default block size of 64k. Was your ZFS fs tuned to that blocksize ? You can configure the blocksize used by the storage daemon using the "maximum / minimum block size" parameters which, according to the manual, will add padding to a block to allow blocksize alignment. Keeping one job per volume did increase our dedup ratio during testing. > Phase 3 - Alignment of the datastream to underlying file-system blocks > and separate of bacula metadata to separate blocks > - This would allow for underlying storage system deduplication between > both bacula file devices and real data stored elsewhere on the > file-system / volume / san. > -- Med venlig hilsen / Best Regards Henrik Johansen [email protected] Tlf. 75 53 35 00 ScanNet Group A/S ScanNet ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
