On 12/8/16 1:36 PM, Christoph Anton Mitterer wrote:
> Hey.
> 
> I just wondered whether out-of-band/"offline" dedup is safe for general
> use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so
> (it tells about unspecified performance issues), but this seems again
> already outdated (kernel 4.7)...
> :-(

SUSE supports it in SLE12 using our 3.12 and 4.4 -based kernels.  There
haven't been a lot of changes to the kernel component of it.  It's
pretty simple: check to see if the ranges are identical between two
files and then reflink between them.

> My intention was to use it with duperemove, but AFAIU, the kernel
> itself will anyway do a byte-by-byte comparison before any
> deduplication, so in principle it should be totally safe regardless of
> the stability of the userland tool, right?
> Especially I wouldn't want that "identity" is only assumed because of
> some checksum identity (or collision ;) ).

Yep.  It does a full check in the kernel for precisely that reason.
It's not even enough to do it in userspace because we don't want dedupe
to be race prone.  It's either atomically identical or it's not, and we
don't dedupe if it's not.  If it changes immediately after the ioctl
returns, that's fine -- the cloned range will be CoW'd properly.

> Also, is there anything to take note of when this is used with
> compression and snapshots?

I don't believe so.  IIRC dedupe maps the file to see if it's already
cloned, so it's safe for snapshots (or could relink extents in a
snapshot that diverged and then were restored to their original
contents.  Dedupe works with the uncompressed data, so compression
shouldn't matter here.  I haven't tested it, though.

> What when I use it with incremental send/receive... i.e. I dedupe the
> "master" and then send/receive this to another btrfs... will it work
> (that is will the copy be also deduplicated, with no longer needed
> extents properly being freed)... or at least not cause any corruptions?

It should.  IIRC send also maps the file (using a different mechanism)
and receive will clone those ranges on the other end.

> Any other things in terms of possible issues, data corruption, etc.
> that one should know when using deduplication?

There shouldn't be.  We haven't had any bug reports at SUSE.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to