steFaiz opened a new issue, #6834:
URL: https://github.com/apache/paimon/issues/6834

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   At present, for scalar indexes we mainly support the Bitmap index. Many 
thanks to @leaves12138 for the implementation, which has established a solid 
framework for scalar indexing. However, a conventional Bitmap index has 
significant limitations and does not work well for scenarios with high data 
cardinality, such as int, double, and string types.
   
   A global B-Tree index can be built on any comparable data type, providing 
efficient point lookups and range queries. In addition, it is more amenable to 
distributed parallel processing during updates and reads. Therefore, this issue 
proposes implementing a distributed global B-Tree index.
   
   ### Solution
   
   The basic implementation will be built on the SST FileFormat introduced in 
https://github.com/apache/paimon/issues/6734, providing point lookup and range 
query capabilities to cover most common SQL filter predicates.
   ## index construction
   Index construction can be efficiently implemented via range shuffle in Flink 
or Spark: different writer tasks are responsible for writing index files for 
their assigned key ranges, and a commit task then performs a unified commit of 
all generated files. As below:
   
   <img width="1498" height="640" alt="Image" 
src="https://github.com/user-attachments/assets/d8421b0b-f30e-4f2a-a3f1-2157f191aabe";
 />
   
   ## index query
   The metadata of B-Tree index files will record information such as the 
file’s min key, max key, and whether it contains nulls (hasNulls). During index 
planning, we can use this metadata to prune candidate files, then query the 
remaining files in parallel and merge the results.
   
   We will bring more details in future PRs.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to