Thank you very much, Zhonghang, for driving this initiative. I noticed this issue as well, and I think your design looks great!
Here are a few suggestions I have: If my memory serves me correctly, the core problem is that using SeekableInputStream to load the header is too slow. Specifically, loading the value and offset requires multiple calls to SeekableInputStream. Therefore, I propose we consider adding a header size in the metadata and use ByteBuffer to retrieve the entire header in one go. This approach could also be applied to your designs. Best, Jialiang. Jingsong Li <jingsongl...@gmail.com> 于2025年1月20日周一 10:24写道: > Thanks zhonghang, > > Overall design looks good to me! > > I have some questions: > > 1. Are there any other bitmap designs that can be referenced, such as > the bitmap structure of starrocks' internal tables? > 2. We should store the size of the roaring bitmap to make > deserialization faster. See > https://github.com/apache/incubator-paimon/pull/4765 > 3. Maybe we can have better names for dictionaries, maybe we can have > three parts: Index header + Index blocks + Bitmap blocks > > Best, > Jingsong > > On Fri, Jan 17, 2025 at 5:16 PM Jingsong Li <jingsongl...@gmail.com> > wrote: > > > > Hi zhonghang, > > > > Please grant public access to the document. > > > > Best, > > Jingsong > > > > On Fri, Jan 17, 2025 at 3:46 PM zhonghang <1649067...@qq.com.invalid> > wrote: > > > > > > Hi, devs: > > > We found that the first version of the bitmap index > had > > > some performance issues in high cardinality scenarios. > > > We have now made some optimizations and hope to discuss > > > them with you. The detailed design document is as follows [1]. > > > > > > > > > [1]: > > > > https://docs.google.com/document/d/11dJlGlSX3dwYKKrPN0DQ2XQTsx6d9wI6DTBIiiBwomM/edit?usp=sharing > > > > > > > > > Thanks > > > ZhonghangLiu. > > > > > > > > > zhonghang > > > 1649067...@qq.com > > > > > > > > > > > > >