Hi, Zhonghang. > Since I guess there will be a buffer in the implementation of SeekableInputStream to reduce the times of IO, the length of the header and the length of the bitmap are not recorded in the first version. Same as mine, so the bsi index I contributed also have some similar problem I need to spend some time to fix. And this is another topic.
Additionally, would you be able to take some time to conduct a benchmark comparison between your design and ByteBuffer? If it turns out that ByteBuffer can resolve this issue, we might consider postponing the introduction of secondary indexes for now. Best, JiaLiang. jialiang tan <tanjialiang1...@gmail.com> 于2025年1月21日周二 19:51写道: > Thank you very much, Zhonghang, for driving this initiative. > > I noticed this issue as well, and I think your design looks great! > > Here are a few suggestions I have: > > If my memory serves me correctly, the core problem is that using > SeekableInputStream to load the header is too slow. > Specifically, loading the value and offset requires multiple calls to > SeekableInputStream. > Therefore, I propose we consider adding a header size in the metadata and > use ByteBuffer to retrieve the entire header in one go. > This approach could also be applied to your designs. > > Best, > Jialiang. > > Jingsong Li <jingsongl...@gmail.com> 于2025年1月20日周一 10:24写道: > >> Thanks zhonghang, >> >> Overall design looks good to me! >> >> I have some questions: >> >> 1. Are there any other bitmap designs that can be referenced, such as >> the bitmap structure of starrocks' internal tables? >> 2. We should store the size of the roaring bitmap to make >> deserialization faster. See >> https://github.com/apache/incubator-paimon/pull/4765 >> 3. Maybe we can have better names for dictionaries, maybe we can have >> three parts: Index header + Index blocks + Bitmap blocks >> >> Best, >> Jingsong >> >> On Fri, Jan 17, 2025 at 5:16 PM Jingsong Li <jingsongl...@gmail.com> >> wrote: >> > >> > Hi zhonghang, >> > >> > Please grant public access to the document. >> > >> > Best, >> > Jingsong >> > >> > On Fri, Jan 17, 2025 at 3:46 PM zhonghang <1649067...@qq.com.invalid> >> wrote: >> > > >> > > Hi, devs: >> > > We found that the first version of the bitmap index >> had >> > > some performance issues in high cardinality scenarios. >> > > We have now made some optimizations and hope to discuss >> > > them with you. The detailed design document is as follows [1]. >> > > >> > > >> > > [1]: >> > > >> https://docs.google.com/document/d/11dJlGlSX3dwYKKrPN0DQ2XQTsx6d9wI6DTBIiiBwomM/edit?usp=sharing >> > > >> > > >> > > Thanks >> > > ZhonghangLiu. >> > > >> > > >> > > zhonghang >> > > 1649067...@qq.com >> > > >> > > >> > > >> > > >> >