TheR1sing3un commented on PR #7742:
URL: https://github.com/apache/paimon/pull/7742#issuecomment-4345621738

   @XiaoHongbo-Hope you're right — the two earlier tests didn't actually 
distinguish the buggy and fixed implementations. Both used inputs 
(same-key-twice on a single bucket) where every split ended up 
non-raw_convertible, which means the pre-fix loop body never ran and the 
fallback `return splits` returned everything anyway. Thanks for catching it.
   
   I've replaced them with a single, deterministic reproducer that does 
exercise the bug:
   
   - PK table partitioned on `dt`, `bucket=1`.
   - Partition `p1` — two overlapping writes on the same PK → 
**non-raw_convertible** split.
   - Partition `p2` — one write → **raw_convertible** split with `row_count=1`.
   
   `PrimaryKeyTableSplitGenerator` walks partitions in order, so the plan is 
`[non-raw (p1), raw (p2)]`. With `with_limit(1)` the pre-fix loop skips the 
non-raw split, then immediately early-returns after the raw one — 
`limited_splits=[raw]`, p1's data is silently dropped.
   
   End-to-end check:
   
   ```
   $ git checkout origin/master -- pypaimon/read/scanner/file_scanner.py
   $ pytest ...test_limit_drops_non_raw_split_after_raw_budget_is_met
   FAILED ... AssertionError: 1 != 2
   $ git checkout HEAD -- pypaimon/read/scanner/file_scanner.py
   $ pytest ...test_limit_drops_non_raw_split_after_raw_budget_is_met
   1 passed
   ```
   
   Force-pushed 
[3b7c7484b](https://github.com/apache/paimon/pull/7742/commits/3b7c7484b) with 
the new reproducer and an updated commit message / PR description that walks 
through why the bug requires `[non-raw, raw]` ordering. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to