steFaiz commented on PR #8250: URL: https://github.com/apache/paimon/pull/8250#issuecomment-4720147255
Hi @JingsongLi , I'd like to discuss whether this PR is essentially equivalent to the original approach from https://github.com/apache/paimon/pull/6956 ?As this picture illustrated: <img width="500" height="480" alt="image" src="https://github.com/user-attachments/assets/df8b5b21-6957-4ca1-a1c6-79fe32d0405f" /> the BTree index builder simply used a single fullRange (min/max RowID across all input data splits) for every generated index file. The topology was straightforward: scan → shuffle sort by <partition, indexed_col> → roll files on partition boundary or recordsPerRange threshold. The topo is extremly simple. ## Why the original PR is changed? if partitions are interleaved in RowID assignment, it may face data loss. For example: If there are partitions A & B: ``` text Part A: [0, 9] [100, 110] Part B: [10, 109] ``` And if only A is indexed, B is not. All index files generated will have row range `[0, 110]`. At query time, if the system saw that the query's RowID range was fully "covered" by the index file range, it might skip scanning raw data — causing rows from partition B ([1, 100]) to be silently lost. ## Why currently we can go back now the semantics have already shifted to index-only: once a global index exists and matches the filter columns, the query exclusively uses the index result bitmap to locate rows — there is no fallback scan for "uncovered" RowID regions. So: 1. The original "data loss" concern no longer applies 2. Even if partition B gets indexed later, since A and B's index files will inevitably have different ranges (different partitions wont have exactly same row range), the query path in GlobalIndexScanner will treat them as separate entries in the Map<Range, List<IndexFileMeta>> map. Both readers will be executed and their results OR-merged via UnionGlobalIndexReader. This is correct — no data is lost, no extra logic needed. My concern is that does this PR actually equivalent to the original implementation? But much much more complicated, with so many merge, split and map operations and have much more Flink/Spark tasks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
