On ke, 2011-01-05 at 14:46 -0500, Josef Bacik wrote:
> Blah blah blah, I'm not having an argument about which is better because I
> simply do not care.  I think dedup is silly to begin with, and online dedup 
> even
> sillier.  The only reason I did offline dedup was because I was just toying
> around with a simple userspace app to see exactly how much I would save if I 
> did
> dedup on my normal system, and with 107 gigabytes in use, I'd save 300
> megabytes.  I'll say that again, with 107 gigabytes in use, I'd save 300
> megabytes.  So in the normal user case dedup would have been wholey useless to
> me.

I have been thinking a lot about de-duplication for a backup application
I am writing. I wrote a little script to figure out how much it would
save me. For my laptop home directory, about 100 GiB of data, it was a
couple of percent, depending a bit on the size of the chunks. With 4 KiB
chunks, I would save about two gigabytes. (That's assuming no MD5 hash
collisions.) I don't have VM images, but I do have a fair bit of saved
e-mail. So, for backups, I concluded it was worth it to provide an
option to do this. I have no opinion on whether it is worthwhile to do
in btrfs.

(For my script, see find-duplicate-chunks in
http://code.liw.fi/debian/pool/main/o/obnam/obnam_0.14.tar.gz or get the
current code using "bzr get http://code.liw.fi/obnam/bzr/trunk/";.
http://braawi.org/obnam/ is the home page of the backup app.)

-- 
Blog/wiki/website hosting with ikiwiki (free for free software):
http://www.branchable.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to