Hi everyone,

For the v4 adaptive metadata tree work, we are planning on embedding
bitmaps in the root manifest that act as metadata/manifest deletion vectors
(MDVs). Amogh looked into how much space this would take in the manifests
and we found that the Roaring format is pretty large at the scale we're
targeting. When we compare it to raw bitmaps, we would be storing an extra
500-2,000 bytes per bitmap. As a result, I tried to see if we could use the
ideas from Roaring, but with smaller containers to fit better with our more
limited use case: manifests that contain roughly 50,000 entries (a single
Roaring container). Since it is like Roaring but smaller, I've been calling
the new format Mumbling.

You can view the results comparing Roaring, raw bitmaps, and Mumbling
<https://docs.google.com/spreadsheets/d/1jOCMS6LYxqFcSKBa848tos2vhRBzZJQN3RqyUa0ElSY/edit?gid=0#gid=0>.
The results look promising: compressed sizes track much more closely to the
raw bitmap and the format has smaller overhead in memory than even Roaring
because of the more granular containers.

The next steps are to discuss whether we want to use this format. To do
that, I've written up a Mumbling spec
<https://gist.github.com/rdblue/20ff3fd91df03483ed84e6de26583743> document
so that it is clear what exactly the format is doing. That should help us
evaluate the design choices
<https://gist.github.com/rdblue/20ff3fd91df03483ed84e6de26583743#design-choices>
and the cost of implementing this.

I think that we should move forward with this bitmap format. It would save
quite a bit of space in the root manifest and it is a fairly simple spec.
My size tests used an implementation in Rust
<https://gist.github.com/rdblue/c989c8c2fecee951528973cbcc183a9b> that is
fairly compact so it is not a huge amount of work. I've also reached out
and we may be able to partner with the Roaring community to make this a
part of the larger standard.

Please take a look and discuss. Thanks,

Ryan

Reply via email to