Thank you very much, Zhonghang, for driving this initiative.

I noticed this issue as well, and I think your design looks great!

Here are a few suggestions I have:

If my memory serves me correctly, the core problem is that using
SeekableInputStream to load the header is too slow.
Specifically, loading the value and offset requires multiple calls to
SeekableInputStream.
Therefore, I propose we consider adding a header size in the metadata and
use ByteBuffer to retrieve the entire header in one go.
This approach could also be applied to your designs.

Best,
Jialiang.

Jingsong Li <jingsongl...@gmail.com> 于2025年1月20日周一 10:24写道:

> Thanks zhonghang,
>
> Overall design looks good to me!
>
> I have some questions:
>
> 1. Are there any other bitmap designs that can be referenced, such as
> the bitmap structure of starrocks' internal tables?
> 2. We should store the size of the roaring bitmap to make
> deserialization faster. See
> https://github.com/apache/incubator-paimon/pull/4765
> 3. Maybe we can have better names for dictionaries, maybe we can have
> three parts: Index header + Index blocks + Bitmap blocks
>
> Best,
> Jingsong
>
> On Fri, Jan 17, 2025 at 5:16 PM Jingsong Li <jingsongl...@gmail.com>
> wrote:
> >
> > Hi zhonghang,
> >
> > Please grant public access to the document.
> >
> > Best,
> > Jingsong
> >
> > On Fri, Jan 17, 2025 at 3:46 PM zhonghang <1649067...@qq.com.invalid>
> wrote:
> > >
> > > Hi, devs:
> > > &nbsp; &nbsp; We found that the first version of the bitmap index
> had&nbsp;
> > > some performance issues in high cardinality scenarios.&nbsp;
> > > We have now made some optimizations and hope to discuss&nbsp;
> > > them with you.&nbsp;The detailed design document is as follows [1].
> > >
> > >
> > > [1]:
> > >
> https://docs.google.com/document/d/11dJlGlSX3dwYKKrPN0DQ2XQTsx6d9wI6DTBIiiBwomM/edit?usp=sharing
> > >
> > >
> > > Thanks
> > > ZhonghangLiu.
> > >
> > >
> > > zhonghang
> > > 1649067...@qq.com
> > >
> > >
> > >
> > > &nbsp;
>

Reply via email to