Hi devs,

I would like to start a discussion about PIP-XXX: Introduce the
range-bitmap file index [1].

Currently, we support the bitmap and bsi indexes, each of which has its own
advantages and disadvantages.

In the bitmap index:
1. The bitmap v2 index performs very well in the EQ predicate evaluation,
but it also only supports this type of evaluation.
2. In high-base scenarios, a relatively large number of bitmaps will be
required, which may result in a large index file.

In the bsi index:
1. The bsi index supports the EQ and Range predicates, but only for numeric
data types.

To resolve the shortcomings of both bitmap index and bsi index, I would
like to propose a new type of index: range-bitmap.

It combines all the advantages of both bitmap and bsi index. It supports
the EQ and Range predicates evaluation, as well as index building for all
basic data types, particularly STRING, DOUBLE and FLOAT. Compared to bitmap
v2 indexes, it reduces the number of bitmaps by a logâ‚‚ factor.

In addition, the range-bitmap index performs better than the bsi index in
all cases of evaluation, I propose that we mark the bsi index as deprecated.

See the implementation [2].

Looking forward to your feedback, thanks!

[1]
https://docs.google.com/document/d/14YXPtCUmvjwozdLhgWJdPgHrVYOTdv9uiC1N2GlNvG4/edit?usp=sharing
[2] https://github.com/Tan-JiaLiang/paimon/tree/feature/rangebitmapV2

Best,
Tan JiaLiang.

Reply via email to