I could argue here that other languages should not be a blocker here.

I can speak on behalf of iceberg-go, implementing this as native feature
there is doable thing.

Implementing Mumbling in Go natively worst case it's ~1–2 weeks of isolated
work in a new internal package; best case (Roaring upstream accepts it)
it's days of glue code. I can assume the same cost for other languages
(java and cpp primarily).

There is no language-specific risk here — the format is deliberately
simple, the Rust prototype is small, the spec has concrete byte-level test
vectors, and it does not touch any other iceberg-go packages.

Best,
Andrei

вт, 21 апр. 2026 г., 15:30 Maximilian Michels <[email protected]>:

> Hi Ryan,
>
> Thanks for the detailed analysis. The storage savings (for the sparse
> range) and the general memory savings over Roaring are compelling.
>
> My main concern would be having to maintain our own bitmap format
> across all implementations of Iceberg. I suppose it would be mainly
> Java and Rust, as we can leverage Rust bindings for other languages,
> but Roaring already has implementations for every language Iceberg
> supports today.
>
> If we can include Mumbling as part of Roaring, this becomes a no-brainer.
>
> -Max
>
> On Tue, Apr 21, 2026 at 1:02 AM Ryan Blue <[email protected]> wrote:
> >
> > Hi everyone,
> >
> > For the v4 adaptive metadata tree work, we are planning on embedding
> bitmaps in the root manifest that act as metadata/manifest deletion vectors
> (MDVs). Amogh looked into how much space this would take in the manifests
> and we found that the Roaring format is pretty large at the scale we're
> targeting. When we compare it to raw bitmaps, we would be storing an extra
> 500-2,000 bytes per bitmap. As a result, I tried to see if we could use the
> ideas from Roaring, but with smaller containers to fit better with our more
> limited use case: manifests that contain roughly 50,000 entries (a single
> Roaring container). Since it is like Roaring but smaller, I've been calling
> the new format Mumbling.
> >
> > You can view the results comparing Roaring, raw bitmaps, and Mumbling.
> The results look promising: compressed sizes track much more closely to the
> raw bitmap and the format has smaller overhead in memory than even Roaring
> because of the more granular containers.
> >
> > The next steps are to discuss whether we want to use this format. To do
> that, I've written up a Mumbling spec document so that it is clear what
> exactly the format is doing. That should help us evaluate the design
> choices and the cost of implementing this.
> >
> > I think that we should move forward with this bitmap format. It would
> save quite a bit of space in the root manifest and it is a fairly simple
> spec. My size tests used an implementation in Rust that is fairly compact
> so it is not a huge amount of work. I've also reached out and we may be
> able to partner with the Roaring community to make this a part of the
> larger standard.
> >
> > Please take a look and discuss. Thanks,
> >
> > Ryan
>

Reply via email to