On Thu, Jan 6, 2011 at 12:36 AM, Josef Bacik <jo...@redhat.com> wrote:
> Here are patches to do offline deduplication for Btrfs.  It works well for the
> cases it's expected to, I'm looking for feedback on the ioctl interface and
> such, I'm well aware there are missing features for the userspace app (like
> being able to set a different blocksize).  If this interface is acceptable I
> will flesh out the userspace app a little more, but I believe the kernel side 
> is
> ready to go.
>
> Basically I think online dedup is huge waste of time and completely useless.
> You are going to want to do different things with different data.  For 
> example,
> for a mailserver you are going to want to have very small blocksizes, but for
> say a virtualization image store you are going to want much larger blocksizes.
> And lets not get into heterogeneous environments, those just get much too
> complicated.  So my solution is batched dedup, where a user just runs this
> command and it dedups everything at this point.  This avoids the very costly
> overhead of having to hash and lookup for duplicate extents online and lets us
> be _much_ more flexible about what we want to deduplicate and how we want to 
> do
> it.
>
> For the userspace app it only does 64k blocks, or whatever the largest area it
> can read out of a file.  I'm going to extend this to do the following things 
> in
> the near future
>
> 1) Take the blocksize as an argument so we can have bigger/smaller blocks
> 2) Have an option to _only_ honor the blocksize, don't try and dedup smaller
> blocks
> 3) Use fiemap to try and dedup extents as a whole and just ignore specific
> blocksizes
> 4) Use fiemap to determine what would be the most optimal blocksize for the 
> data
> you want to dedup.
>
> I've tested this out on my setup and it seems to work well.  I appreciate any
> feedback you may have.  Thanks,
>

FYI: Using clone ioctl can do the same thing (except reading data and
computing hash in user space).

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to