steFaiz opened a new issue, #6834: URL: https://github.com/apache/paimon/issues/6834
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Motivation At present, for scalar indexes we mainly support the Bitmap index. Many thanks to @leaves12138 for the implementation, which has established a solid framework for scalar indexing. However, a conventional Bitmap index has significant limitations and does not work well for scenarios with high data cardinality, such as int, double, and string types. A global B-Tree index can be built on any comparable data type, providing efficient point lookups and range queries. In addition, it is more amenable to distributed parallel processing during updates and reads. Therefore, this issue proposes implementing a distributed global B-Tree index. ### Solution The basic implementation will be built on the SST FileFormat introduced in https://github.com/apache/paimon/issues/6734, providing point lookup and range query capabilities to cover most common SQL filter predicates. ## index construction Index construction can be efficiently implemented via range shuffle in Flink or Spark: different writer tasks are responsible for writing index files for their assigned key ranges, and a commit task then performs a unified commit of all generated files. As below: <img width="1498" height="640" alt="Image" src="https://github.com/user-attachments/assets/d8421b0b-f30e-4f2a-a3f1-2157f191aabe" /> ## index query The metadata of B-Tree index files will record information such as the file’s min key, max key, and whether it contains nulls (hasNulls). During index planning, we can use this metadata to prune candidate files, then query the remaining files in parallel and merge the results. We will bring more details in future PRs. ### Anything else? _No response_ ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
