On Wed, Jun 17, 2020 at 3:47 PM Pawel Jakub Dawidek <pa...@dawidek.net> wrote:
> On 6/15/20 09:18, Matthew Ahrens via openzfs-developer wrote: > > However, even so, looking up in the BRT for every single zio_free() > > would be a substantial cost. [...] > > After giving it some more thought we could avoid that cost by leveraging > the fact that we operate on offsets within VDEVs. > > We could maintain a table of fixed size regions for each VDEV. The table > entry is a reference counter. Let's call it Table of Regions (ToR)... > > For example we divide a VDEV into 1GB regions. Each region gets his own > 32-bit counter (21-bit counter would be enough as we can get only 2^21 > 512-byte blocks in 1GB). Every time _new_ entry in RBT shows up, we > increase the counter in ToR's entry for this block. Every time we free a > block we take a look at ToR first to see if we should check RBT. If the > counter for this region is 0 there are no entries in RBT, thus there is > no need to consult RBT, so there is no additional cost for zio_free(). > > ToR is extremely small. For 1GB regions and 32 counter it takes 4kB > (four kilobytes) of RAM per 1TB per top-level VDEV. > > Note that ToR is only updated for a new entry in RBT or when entry is > removed from RBT. We don't update ToR when we increase counter on an > existing RBT entry. > How much of the ToR would we expect to be nonzero? I think a disk image (e.g. vmdk file) that's been updated incrementally for a while could easily be spread across every 1GB chunk of the pool. Cloning that file would result in all-nonzero ToR entries, defeating its purpose. Example analysis: disk image of 1TB with recordsize=8K has 128 million blocks. A 100TB pool has 100,000x 1GB regions. If the blocks are distributed randomly throughout the pool, there will be >1000 blocks of this file in each region. --matt > > -- > Pawel Jakub Dawidek > > ------------------------------------------ > openzfs: openzfs-developer > Permalink: > https://openzfs.topicbox.com/groups/developer/Te62797341aee0806-M557cacb30e3094ff907e04f5 > Delivery options: > https://openzfs.topicbox.com/groups/developer/subscription > ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/Te62797341aee0806-M95110862540c5f29f14a361e Delivery options: https://openzfs.topicbox.com/groups/developer/subscription