On Wed, Apr 10, 2013 at 02:05:32PM +0200, Marek Otahal wrote: > Hello, > this is awesome news! thank you for working on dedup.
Hi, Your previous thread floating on the LIST did give me some light, thanks :) > > I have some questions about the dedup approach in regards to other > layers/features. > > 1/ How will the snapshots be handled? > > Whether data would be dedup-ed between snapshots (potentially big saved-space > ratio), or would snapshots be considered isolated? Best, if this could be set > by the user. My concern is about being error-prone, where with deduping > snapshots, actually only 1 copy of the data would exist and a corruption > would damage it as well as all snapshots. Or is this not a problem and we say > "safety" is handled by RAID? I don't think that snapshot should be responsible on data security, even without dedup, snapshot's files can still share the same disk space/extent with the source. So I won't treat dedup on snapshot as a special case. > > 2/ Order of dedup/compression? > > What would be done first, compress a file and then compare blocks for > duplicates, or the other way around? > > Dedup 1st would save some compression work: > file's block 0000000000 -> hash -> isDup? (if no)-> compress (10x0) -> write > but proble is written data size is unknown (it's not the 1 block at start) > > Other way, compress first, would waste compression cpu-operations on > duplicate blocks, but would yield reduced dedup-related metadata usage, as 1 > million of zeros would be compressed to a single block and that one only is > compared/written. Usefullness here depends on the compression ratio of the > file. > > I'm not sure which approach here would be better? > In my opinion, dedup is a special kind of compression, it's not worth doing dedup on compressed data as I want to keep this feature to be simple. I prefer to dedup first rather than compression(but I could be wrong). thanks, liubo > > > > Thank you for your time and explanation. > Best wishes, Mark > > On Sunday 07 April 2013 21:12:48 Liu Bo wrote: > > (NOTE: This leads to a FORMAT CHANGE, DO NOT use it on real data.) > > > > This introduce the online data deduplication feature for btrfs. > > > > (1) WHY do we need deduplication? > > To improve our storage effiency. > > > > (2) WHAT is deduplication? > > Two key ways for practical deduplication implementations, > > * When the data is deduplicated > > (inband vs background) > > * The granularity of the deduplication. > > (block level vs file level) > > > > For btrfs, we choose > > * inband(synchronous) > > * block level > > > > We choose them because of the same reason as how zfs does. > > a) To get an immediate benefit. > > b) To remove redundant parts within a file. > > > > So we have an inband, block level data deduplication here. > > > > (3) HOW does deduplication works? > > This makes full use of file extent back reference, the same way as > > IOCTL_CLONE, which lets us easily store multiple copies of a set of > > data as a single copy along with an index of references to the copy. > > > > Here we have > > a) a new dedicated tree(DEDUP tree) and > > b) a new key(BTRFS_DEDUP_ITEM_KEY), which consists of > > (stop 64bits of hash, type, disk offset), > > * stop 64bits of hash > > It comes from sha256, which is very helpful on avoiding > > collision. > > And we take the stop 64bits as the index. > > * disk offset > > It helps to find where the data is stored. > > > > So the whole deduplication process works as, > > 1) write something, > > 2) calculate the hash of this "something", > > 3) try to find the match of hash value by searching DEDUP keys in > > a dedicated tree, DEDUP tree. > > 4) if found, skip real IO and link to the existing copy > > if not, do real IO and insert a DEDUP key to the DEDUP tree. > > > > For now, we limit the deduplication unit to PAGESIZE, 4096, and we're > > going to increase this unit dynamically in the future. > > > > Signed-off-by: Liu Bo <bo.li....@oracle.com> > > -- > > Marek Otahal :o) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html