> ### Ref block format
>
> A ref block is written as:
>
>     'r'
>     uint24( block_len )
>     ref_record+
>     uint32( restart_offset )+
>     uint16( restart_count )
>     padding?
>

So I learned that your current writer is a two block pass,
i.e. the block is first written into memory and then once
the block looks complete it is written out to disk.

This would allow us to shuffle the data around during
the actual out-to-disk-phase, such as this:

  'r'
  uint24( restart_count )
  uint32( restart_offset )+
  ref_record+
  ref_record_endmarker
  padding?

(A) In nearby emails we discussed to have the restart offsets
to be 24 bit, but now they are 32-bit aligned to the start of a block
so we could keep them 32 bit for simplicity of reading.

(B) Note how there is no block_len encoding, which was originally
only needed to lookup the position of restart_count. (so even for that
we could rename it to padding_len, such that the position of
restart_count can be decoded easily)

We no longer need the block_len as the restart_count comes right
after the 'r'.

Instead we'll have a ref_record_endmarker that reads as a ref
with both prefix and suffix to '0', type deletion (such that there is
no further cost). The end marker would only need two '0's, which
makes it indistinguishable from padding.

Reply via email to