Re: Offline Deduplication for Btrfs

Gordan Bobic Thu, 06 Jan 2011 06:41:35 -0800

Ondřej Bílka wrote:

Then again, for a lot of use-cases there are perhaps better ways to
achieve the targed goal than deduping on FS level, e.g. snapshotting or
something like fl-cow:
http://www.xmailserver.org/flcow.html

As VM are concerned fl-cow is poor replacement of deduping.

Depends on your VM. If your VM uses monolithic images, then you'reright. For a better solution, take a look at vserver's hashify featurefor something that does this very well in it's own context.

Upgrading packages? 1st vm upgrades and copies changed files.
After while second upgrades and copies files too. More and more becomes duped 
again.


So you want online dedupe, then. :)

If you host multiple distributions you need to translate
that /usr/share/bin/foo in foonux is /us/bin/bar in barux

The chances of the binaries being the same between distros are betweenslim and none. In the context of VMs where you have access to raw files,as I said, look at vserver's hashify feature. It doesn't care about filenames, it will COW hard-link all files with identical content. Thisdoesn't even require an exhaustive check of all the files' contents -you can start with file sizes. Files that have different sizes can'thave the same contents, so you can discard most of the comparing beforeyou even open the file, most of the work gets done based on metadata alone.

And primary reason to dedupe is not to reduce space usage but to
improve caching. Why should machine A read file if machine B read it five 
minutes ago.

Couldn't agree more. This is what I was trying to explain earlier. Evenif deduping did cause more fragmentation (and I don't think that is thecase to any significant extent), the improved caching efficiency wouldmore than offset this.


Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Offline Deduplication for Btrfs

Reply via email to