Thank you Matt for your comments, answers inline.

On 6/15/20 09:18, Matthew Ahrens via openzfs-developer wrote:
> Cool!  Couple of questions/observations:
> 
> Do I understand correctly that the new data structure you're proposing
> (the BRT) maps from DVA to refcount?

Correct.

> If so, and we can keep this data structure sorted on disk (by DVA), we
> would be more likely to get multiple useful entries when reading one
> block of the BRT.  That would reduce the pathologies of the DDT (where
> each block of the DDT contains random entries).

Exactly. Plus on-disk BRT entry size is 24 bytes and in-memory is 80
bytes for now (vs. 392 bytes of DDT entry). Smaller structure sizes just
delays the problem, of course, but sorting can be a big win.

Also note that this table only grows when you explicitly clone a block
and not for every block as in dedup case.

In addition to that when you move a file between datasets, the BRT
entries are just created temporarily, as we create them to create
destination copy, but remove them when we remove the old copy.

All in all I'd expect this table to much, much smaller than DDT.

> However, even so, looking up in the BRT for every single zio_free()
> would be a substantial cost.  I imagine that in practice, the BRT would
> need to be fully cached to get good performance.  In practice the
> substantial difference from using the DDT may only be that we don't have
> to use a strong checksum (because it's indexed by DVA instead of
> checksum).  Aside from that, you could use the DDT, and assume that if
> we don't find an entry, it has an effective refcount of 1.

In theory yes, but in practice I'd expect a lot of hacking would need to
be done to make DDT not to operate on checksums, not to depend on the D
bit, etc.

> I think there could be a more efficient solution to a subset of the
> problems that this tackles.  For example, here is an incomplete idea for
> file cloning: We could create a new "file clone family refcount" (FCFR)
> data structure when each file is cloned.  The FCFR would map from
> blockID -> refcount.  Each object would have a (normally empty) pointer
> to its FCFR data structure.  This way, only files that are clone would
> pay any performance penalty.  And each cloned file's data structure is
> independent, so manipulating one cloned file doesn't have to deal with a
> huge (pool-wide) data structure.

I'm afraid I don't fully understand the idea. When you clone a block
from file A to file B how do you modify FCFR of the file A if file A is
only accessible via a snapshot?

-- 
Pawel Jakub Dawidek

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/Te62797341aee0806-Mdc5a98220d49e65e70df14e8
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Reply via email to