On Wed, Jun 17, 2020 at 3:47 PM Pawel Jakub Dawidek <pa...@dawidek.net>
wrote:

> On 6/15/20 09:18, Matthew Ahrens via openzfs-developer wrote:
> > However, even so, looking up in the BRT for every single zio_free()
> > would be a substantial cost. [...]
>
> After giving it some more thought we could avoid that cost by leveraging
> the fact that we operate on offsets within VDEVs.
>
> We could maintain a table of fixed size regions for each VDEV. The table
> entry is a reference counter. Let's call it Table of Regions (ToR)...
>
> For example we divide a VDEV into 1GB regions. Each region gets his own
> 32-bit counter (21-bit counter would be enough as we can get only 2^21
> 512-byte blocks in 1GB). Every time _new_ entry in RBT shows up, we
> increase the counter in ToR's entry for this block. Every time we free a
> block we take a look at ToR first to see if we should check RBT. If the
> counter for this region is 0 there are no entries in RBT, thus there is
> no need to consult RBT, so there is no additional cost for zio_free().
>
> ToR is extremely small. For 1GB regions and 32 counter it takes 4kB
> (four kilobytes) of RAM per 1TB per top-level VDEV.
>
> Note that ToR is only updated for a new entry in RBT or when entry is
> removed from RBT. We don't update ToR when we increase counter on an
> existing RBT entry.
>

How much of the ToR would we expect to be nonzero?  I think a disk image
(e.g. vmdk file) that's been updated incrementally for a while could easily
be spread across every 1GB chunk of the pool.  Cloning that file would
result in all-nonzero ToR entries, defeating its purpose.

Example analysis: disk image of 1TB with recordsize=8K has 128 million
blocks.  A 100TB pool has 100,000x 1GB regions.  If the blocks are
distributed randomly throughout the pool, there will be >1000 blocks of
this file in each region.

--matt


>
> --
> Pawel Jakub Dawidek
>
> ------------------------------------------
> openzfs: openzfs-developer
> Permalink:
> https://openzfs.topicbox.com/groups/developer/Te62797341aee0806-M557cacb30e3094ff907e04f5
> Delivery options:
> https://openzfs.topicbox.com/groups/developer/subscription
>

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/Te62797341aee0806-M95110862540c5f29f14a361e
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Reply via email to