steFaiz commented on PR #8250:
URL: https://github.com/apache/paimon/pull/8250#issuecomment-4720147255

   Hi @JingsongLi , I'd like to discuss whether this PR is essentially 
equivalent to the original approach from 
https://github.com/apache/paimon/pull/6956 ?As this picture illustrated:
   <img width="500" height="480" alt="image" 
src="https://github.com/user-attachments/assets/df8b5b21-6957-4ca1-a1c6-79fe32d0405f";
 />
   
   the BTree index builder simply used a single fullRange (min/max RowID across 
all input data splits) for every generated index file. The topology was 
straightforward: scan → shuffle sort by <partition, indexed_col> → roll files 
on partition boundary or recordsPerRange threshold. The topo is extremly simple.
   ## Why the original PR is changed?
   if partitions are interleaved in RowID assignment, it may face data loss. 
For example: If there are partitions A & B:
   ``` text
   Part A: [0, 9] [100, 110]
   Part B: [10, 109]
   ```
   And if only A is indexed, B is not. All index files generated will have row 
range `[0, 110]`. At query time, if the system saw that the query's RowID range 
was fully "covered" by the index file range, it might skip scanning raw data — 
causing rows from partition B ([1, 100]) to be silently lost.
   
   ## Why currently we can go back now
   the semantics have already shifted to index-only: once a global index exists 
and matches the filter columns, the query exclusively uses the index result 
bitmap to locate rows — there is no fallback scan for "uncovered" RowID regions.
   So:
   1. The original "data loss" concern no longer applies
   2. Even if partition B gets indexed later, since A and B's index files will 
inevitably have different ranges (different partitions wont have exactly same 
row range), the query path in GlobalIndexScanner will treat them as separate 
entries in the Map<Range, List<IndexFileMeta>> map. Both readers will be 
executed and their results OR-merged via UnionGlobalIndexReader. This is 
correct — no data is lost, no extra logic needed.
   
   My concern is that does this PR actually equivalent to the original 
implementation? But much much more complicated, with so many merge, split and 
map operations and have much more Flink/Spark tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to