On 6/15/20 09:18, Matthew Ahrens via openzfs-developer wrote: > However, even so, looking up in the BRT for every single zio_free() > would be a substantial cost. [...]
After giving it some more thought we could avoid that cost by leveraging the fact that we operate on offsets within VDEVs. We could maintain a table of fixed size regions for each VDEV. The table entry is a reference counter. Let's call it Table of Regions (ToR)... For example we divide a VDEV into 1GB regions. Each region gets his own 32-bit counter (21-bit counter would be enough as we can get only 2^21 512-byte blocks in 1GB). Every time _new_ entry in RBT shows up, we increase the counter in ToR's entry for this block. Every time we free a block we take a look at ToR first to see if we should check RBT. If the counter for this region is 0 there are no entries in RBT, thus there is no need to consult RBT, so there is no additional cost for zio_free(). ToR is extremely small. For 1GB regions and 32 counter it takes 4kB (four kilobytes) of RAM per 1TB per top-level VDEV. Note that ToR is only updated for a new entry in RBT or when entry is removed from RBT. We don't update ToR when we increase counter on an existing RBT entry. -- Pawel Jakub Dawidek ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/Te62797341aee0806-M557cacb30e3094ff907e04f5 Delivery options: https://openzfs.topicbox.com/groups/developer/subscription