hang8929201 opened a new pull request, #5028: URL: https://github.com/apache/paimon/pull/5028
<!-- Please specify the module before the PR name: [core] ... or [flink] ... --> ### Purpose When the bitmap-indexed column cardinality is high, using the first version of the bitmap index format will take a lot of time to read the entire dictionary. But in fact we don't need a full dictionary when dealing with a small number of predicates, the performance of predicate hits on the bitmap can be improved by creating a secondary index on the dictionary. https://docs.google.com/document/d/11dJlGlSX3dwYKKrPN0DQ2XQTsx6d9wI6DTBIiiBwomM/edit?tab=t.0 **performance** cardinality 1000:  cardinality 10000:  cardinality 30000:  cardinality 50000:  cardinality 80000:  cardinality 100000:  <!-- What is the purpose of the change --> ### Tests org.apache.paimon.fileindex.bitmapindex.TestBitmapFileIndex org.apache.paimon.spark.SparkFileIndexITCase org.apache.paimon.benchmark.bitmap.BitmapIndexBenchmark <!-- List UT and IT cases to verify this change --> ### API and Format <!-- Does this change affect API or storage format --> ### Documentation docs/content/concepts/spec/fileindex.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
