baiyangtx commented on PR #7866:
URL: https://github.com/apache/paimon/pull/7866#issuecomment-4532036530
Thanks @JingsongLi for the thorough review. I've addressed most of the
feedback:
1. **ADD filter regression** — Restored `FileEntry.addFilter()` in both
`mergeUnsorted()` and `mergeSortedByPartition()`. DELETE entries are now
filtered at the IO level as before.
2. **IOManager hard-coded tmpdir** — Added `@Nullable IOManager` parameter
to `tryFullCompaction` and `merge`. Falls back to
`System.getProperty("java.io.tmpdir")` when null is passed, so existing callers
are unaffected.
3. **Inconsistent sort key** — Added `fileName` as the 5th field in the sort
row, serving as a deterministic tiebreaker after (partition, bucket, level).
4. **manifest.delta.sorted default** — Changed from `true` to `false`,
consistent with `manifest.merge.sort-on-commit`.
5. **Test assertion for internal order** — Reverted
`containsExactlyInAnyOrderElementsOf` back to `containsExactlyElementsOf` for
the `testLocalSortClusterUnpartitionedTable` internal-order assertion. With
parallelism=1 and a single output file, the order is deterministic.
6. **Minor** — Renamed `partitionRmpr` → `partitionCmp`.
**For discussion — parallel manifest reads (#2)**: The original
`sequentialBatchedExecute` was removed because the sorting path needs all
entries collected before spilling to external sort. I'd appreciate guidance on
whether to restore parallel reads for the non-sorted path only, or if the
simplicity of unified sequential reads is acceptable here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]