On Thu, Jan 6, 2011 at 12:36 AM, Josef Bacik <jo...@redhat.com> wrote: > Here are patches to do offline deduplication for Btrfs. It works well for the > cases it's expected to, I'm looking for feedback on the ioctl interface and > such, I'm well aware there are missing features for the userspace app (like > being able to set a different blocksize). If this interface is acceptable I > will flesh out the userspace app a little more, but I believe the kernel side > is > ready to go. > > Basically I think online dedup is huge waste of time and completely useless. > You are going to want to do different things with different data. For > example, > for a mailserver you are going to want to have very small blocksizes, but for > say a virtualization image store you are going to want much larger blocksizes. > And lets not get into heterogeneous environments, those just get much too > complicated. So my solution is batched dedup, where a user just runs this > command and it dedups everything at this point. This avoids the very costly > overhead of having to hash and lookup for duplicate extents online and lets us > be _much_ more flexible about what we want to deduplicate and how we want to > do > it. > > For the userspace app it only does 64k blocks, or whatever the largest area it > can read out of a file. I'm going to extend this to do the following things > in > the near future > > 1) Take the blocksize as an argument so we can have bigger/smaller blocks > 2) Have an option to _only_ honor the blocksize, don't try and dedup smaller > blocks > 3) Use fiemap to try and dedup extents as a whole and just ignore specific > blocksizes > 4) Use fiemap to determine what would be the most optimal blocksize for the > data > you want to dedup. > > I've tested this out on my setup and it seems to work well. I appreciate any > feedback you may have. Thanks, >
FYI: Using clone ioctl can do the same thing (except reading data and computing hash in user space). Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html