Re: Offline Deduplication for Btrfs

Josef Bacik Mon, 10 Jan 2011 07:37:39 -0800

On Mon, Jan 10, 2011 at 10:28:14AM -0500, Ric Wheeler wrote:
>
> I think that dedup has a variety of use cases that are all very dependent 
> on your workload. The approach you have here seems to be a quite 
> reasonable one.
>
> I did not see it in the code, but it is great to be able to collect 
> statistics on how effective your hash is and any counters for the extra 
> IO imposed.
>


So I have counters for how many extents are deduped and the overall file
savings, is that what you are talking about?

> Also very useful to have a paranoid mode where when you see a hash 
> collision (dedup candidate), you fall back to a byte-by-byte compare to 
> verify that the the collision is correct.  Keeping stats on how often 
> this is a false collision would be quite interesting as well :)
>

So I've always done a byte-by-byte compare, first in userspace but now its in
kernel, because frankly I don't trust hashing algorithms with my data.  It would
be simple enough to keep statistics on how often the byte-by-byte compare comes
out wrong, but really this is to catch changes to the file, so I have a
suspicion that most of these statistics would be simply that the file changed,
not that the hash was a collision.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Offline Deduplication for Btrfs

Reply via email to