Re: Offline Deduplication for Btrfs

Gordan Bobic Wed, 05 Jan 2011 12:47:12 -0800

On 01/05/2011 07:46 PM, Josef Bacik wrote:

Blah blah blah, I'm not having an argument about which is better because I
simply do not care.  I think dedup is silly to begin with, and online dedup even
sillier.

Offline dedup is more expensive - so why are you of the opinion that itis less silly? And comparison by silliness quotiend still sounds like anargument over which is better.

The only reason I did offline dedup was because I was just toying
around with a simple userspace app to see exactly how much I would save if I did
dedup on my normal system, and with 107 gigabytes in use, I'd save 300
megabytes.  I'll say that again, with 107 gigabytes in use, I'd save 300
megabytes.  So in the normal user case dedup would have been wholey useless to
me.

Dedup isn't for an average desktop user. Dedup is for backup storage andvirtual images. I don't remember anyone ever saying it is for theaverage desktop user. I am amazed you got that much saving even - Iwouldn't expect there to be any duplicate files on a normal system.Compression is a feature that the desktop users would benefit with, notdeduplication.

Dedup is only usefull if you _know_ you are going to have duplicate information,
so the two major usecases that come to mind are

1) Mail server.  You have small files, probably less than 4k (blocksize) that
you are storing hundreds to thousands of.  Using dedup would be good for this
case, and you'd have to have a small dedup blocksize for it to be usefull.

Explain to me why you think this would yield duplicate blocks. If yourserver is Maildir, headers will be in the mail files, and because allemails went to different users, they'd have different headers, and thusnot be dedupable.

2) Virtualized guests.  If you have 5 different RHEL5 virt guests, chances are
you are going to share data between them, but unlike with the mail server
example, you are likely to find much larger chunks that are the same, so you'd
want a larger dedup blocksize, say 64k.  You want this because if you did just
4k you'd end up with a ridiculous amount of framentation and performance would
go down the toilet, so you need a larger dedup blocksize to make for better
performance.

Fragmentation will cause you problems anyway, the argument in the UNIXworld since year dot was that defragging doesn't make a damn worth ofdifference when you have a hundred users hammering away on a machinethat has to skip between all their collective files.

If you have VM image files a-la vmware/xen/kvm, then using blocks of thesame size as the guests is the only way that you are going to get sanededuplication performance. Otherwise the blocks won't line up. If thededupe block size is 4KB and guest fs block size is 4KB, that's areasonably clean case.

The biggest win by far, however, would be when using chroot type guests,as I mentioned.

So you'd want an online implementation to give you a choice of dedup blocksize,
which seems to me to be overly complicated.

I'd just make it always use the fs block size. No point in making itvariable.

And then lets bring up the fact that you _have_ to manually compare any data you
are going to dedup.  I don't care if you think you have the greatest hashing
algorithm known to man, you are still going to have collisions somewhere at some
point, so in order to make sure you don't lose data, you have to manually memcmp
the data.  So if you are doing this online, that means reading back the copy you
want to dedup in the write path so you can do the memcmp before you write.  That
is going to make your write performance _suck_.

IIRC, this is configurable in ZFS so that you can switch off thephysical block comparison. If you use SHA256, the probability of acollission (unless SHA is broken, in which case we have much biggerproblems) is 1^128. Times 4KB blocks, that is one collission in 10^24Exabytes. That's one trillion trillion (that's double trillion)Exabytes. That is considerably more storage space than there is likelyto be available on the planet for some time. And just for good measure,you could always up the hash to SHA512 or use two different hashes (e.g.a combination of SHA256 and MD5).

Do I think offline dedup is awesome?  Hell no, but I got distracted doing it as
a side project so I figured I'd finish it, and I did it in under 1400 lines.  I
dare you to do the same with an online implementation.  Offline is simpler to
implement and simpler to debug if something goes wrong, and has an overall
easier to control impact on the system.

It is also better done outside the FS if you're not going to do itproperly using FL-COW or fuse based lessfs.


Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Offline Deduplication for Btrfs

Reply via email to