baiyangtx commented on PR #7866:
URL: https://github.com/apache/paimon/pull/7866#issuecomment-4532036530

   Thanks @JingsongLi for the thorough review. I've addressed most of the 
feedback:
   
   1. **ADD filter regression** — Restored `FileEntry.addFilter()` in both 
`mergeUnsorted()` and `mergeSortedByPartition()`. DELETE entries are now 
filtered at the IO level as before.
   
   2. **IOManager hard-coded tmpdir** — Added `@Nullable IOManager` parameter 
to `tryFullCompaction` and `merge`. Falls back to 
`System.getProperty("java.io.tmpdir")` when null is passed, so existing callers 
are unaffected.
   
   3. **Inconsistent sort key** — Added `fileName` as the 5th field in the sort 
row, serving as a deterministic tiebreaker after (partition, bucket, level).
   
   4. **manifest.delta.sorted default** — Changed from `true` to `false`, 
consistent with `manifest.merge.sort-on-commit`.
   
   5. **Test assertion for internal order** — Reverted 
`containsExactlyInAnyOrderElementsOf` back to `containsExactlyElementsOf` for 
the `testLocalSortClusterUnpartitionedTable` internal-order assertion. With 
parallelism=1 and a single output file, the order is deterministic.
   
   6. **Minor** — Renamed `partitionRmpr` → `partitionCmp`.
   
   **For discussion — parallel manifest reads (#2)**: The original 
`sequentialBatchedExecute` was removed because the sorting path needs all 
entries collected before spilling to external sort. I'd appreciate guidance on 
whether to restore parallel reads for the non-sorted path only, or if the 
simplicity of unified sequential reads is acceptable here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to