TheR1sing3un opened a new pull request, #377:
URL: https://github.com/apache/paimon-rust/pull/377
### Purpose
Linked issue: close #376
When planning a scan, `merge_manifest_entries`
(`crates/paimon/src/table/table_scan.rs`) nets manifest `ADD`/`DELETE` entries
by `(partition, bucket, file_name)`, **omitting `level`**. A single-sorted-run
compaction upgrades a file *in place* (Java `PojoDataFileMeta.upgrade` reuses
the file name and only bumps `level`), emitting `DELETE f@oldLevel` + `ADD
f@newLevel` with the same name. The level-less key lets the `DELETE` cancel the
upgraded `ADD` too, so the live file is dropped from the plan and **its rows
are silently lost on read**.
This mirrors the fix to Java semantics: `FileEntry.Identifier` includes
`level`, and `AbstractFileStoreScan.readAndMergeFileEntries` nets add/delete by
that full identifier.
### Brief change log
- Rewrite `merge_manifest_entries` to collect every `DELETE`'s full
`Identifier` (incl. `level`) and keep `ADD`s whose `identifier()` is not
deleted — reusing the existing `ManifestEntry::identifier()`. This matches Java
`AbstractFileStoreScan.readAndMergeFileEntries` and is independent of
add/delete ordering.
- Align `Identifier`'s `Hash` impl with its `Eq` (and Java
`FileEntry.Identifier.hashCode`): hash all identity fields (`partition, bucket,
level, file_name, extra_files, embedded_index, external_path`). The previous
`Hash` only hashed `partition/bucket/file_name`, so in-place upgraded files all
collided in the dedup set.
### Tests
Two hermetic unit tests (pure logic, no external deps):
-
`table_scan::tests::test_merge_manifest_entries_keeps_in_place_upgraded_file`:
feeds `ADD f@L0, DELETE f@L0, ADD f@L5, ADD g@L0` and asserts the live set is
`{f@L5, g@L0}`.
-
`manifest_entry::tests::test_identifier_distinguishes_in_place_level_upgrade`:
`f@L0` and `f@L5` are unequal and do not alias in a `HashSet<Identifier>`.
Before/after (revert the fix → the first test fails, restore → passes):
```
# before fix
test ... test_merge_manifest_entries_keeps_in_place_upgraded_file ... FAILED
left: [("g.parquet", 0)]
right: [("f.parquet", 5), ("g.parquet", 0)]
# after fix
test result: ok. 684 passed; 0 failed
```
`cargo fmt --all -- --check` and `cargo clippy -p paimon --all-targets` are
clean.
### API and Format
No public API or storage-format changes.
### Documentation
Not needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]