Ondřej Bílka wrote:
Then again, for a lot of use-cases there are perhaps better ways to
achieve the targed goal than deduping on FS level, e.g. snapshotting or
something like fl-cow:
http://www.xmailserver.org/flcow.html
As VM are concerned fl-cow is poor replacement of deduping.
Depends on your VM. If your VM uses monolithic images, then you're
right. For a better solution, take a look at vserver's hashify feature
for something that does this very well in it's own context.
Upgrading packages? 1st vm upgrades and copies changed files.
After while second upgrades and copies files too. More and more becomes duped
again.
So you want online dedupe, then. :)
If you host multiple distributions you need to translate
that /usr/share/bin/foo in foonux is /us/bin/bar in barux
The chances of the binaries being the same between distros are between
slim and none. In the context of VMs where you have access to raw files,
as I said, look at vserver's hashify feature. It doesn't care about file
names, it will COW hard-link all files with identical content. This
doesn't even require an exhaustive check of all the files' contents -
you can start with file sizes. Files that have different sizes can't
have the same contents, so you can discard most of the comparing before
you even open the file, most of the work gets done based on metadata alone.
And primary reason to dedupe is not to reduce space usage but to
improve caching. Why should machine A read file if machine B read it five
minutes ago.
Couldn't agree more. This is what I was trying to explain earlier. Even
if deduping did cause more fragmentation (and I don't think that is the
case to any significant extent), the improved caching efficiency would
more than offset this.
Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html