On Tue, 8 Jul 2008, Nathan Kroenert wrote:

> Even better would be using the ZFS block checksums (assuming we are only
> summing the data, not it's position or time :)...
>
> Then we could have two files that have 90% the same blocks, and still
> get some dedup value... ;)

It seems that the hard problem is not if ZFS has the structure to 
support it (the implementation seems pretty obvious), but rather that 
ZFS is supposed to be able to scale to extremely large sizes.  If you 
have a petabyte of storage in the pool, then the data structure to 
keep track of block similarity could grow exceedingly large.  The 
block checksums are designed to be as random as possible so their 
value does not suggest anything regarding the similarity of the data 
unless the values are identical.  The checksums have enough bits and 
randomness that binary trees would not scale.

Except for the special case of backups or cloned server footprints, 
it does not seem that data deduplication is going to save the 90% (or 
more) space that Quantum claims at 
http://www.quantum.com/Solutions/datadeduplication/Index.aspx.

ZFS clones already provide a form of data deduplication.

The actual benefit of data deduplication to an enterprise seems 
negligible unless the backup system directly supports it.  In the 
enterprise the cost of storage has more to do with backing up the data 
than the amount of storage media consumed.

Bob
======================================
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to