On Mon, Jan 10, 2011 at 10:28:14AM -0500, Ric Wheeler wrote: > > I think that dedup has a variety of use cases that are all very dependent > on your workload. The approach you have here seems to be a quite > reasonable one. > > I did not see it in the code, but it is great to be able to collect > statistics on how effective your hash is and any counters for the extra > IO imposed. >
So I have counters for how many extents are deduped and the overall file savings, is that what you are talking about? > Also very useful to have a paranoid mode where when you see a hash > collision (dedup candidate), you fall back to a byte-by-byte compare to > verify that the the collision is correct. Keeping stats on how often > this is a false collision would be quite interesting as well :) > So I've always done a byte-by-byte compare, first in userspace but now its in kernel, because frankly I don't trust hashing algorithms with my data. It would be simple enough to keep statistics on how often the byte-by-byte compare comes out wrong, but really this is to catch changes to the file, so I have a suspicion that most of these statistics would be simply that the file changed, not that the hash was a collision. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html