TheR1sing3un opened a new issue, #376: URL: https://github.com/apache/paimon-rust/issues/376
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon-rust/issues) and found nothing similar. ### Please describe the bug 🐞 When planning a scan, `merge_manifest_entries` in `crates/paimon/src/table/table_scan.rs` nets manifest `ADD`/`DELETE` entries using the identity tuple `(partition, bucket, file_name)` — it **omits `level`**: ```rust let deleted_keys: HashSet<(&[u8], i32, &str)> = delete_entries .iter() .map(|e| (e.partition(), e.bucket(), e.file().file_name.as_str())) .collect(); // keep adds whose (partition, bucket, file_name) is not in deleted_keys ``` A single-sorted-run compaction upgrades a file **in place**: Java `PojoDataFileMeta.upgrade(newLevel)` reuses the same `fileName` and only bumps `level`, so the commit emits `DELETE f@oldLevel` + `ADD f@newLevel` with the **same file name**. Because the dedup key ignores `level`, the `DELETE@oldLevel` cancels **both** the old add and the upgraded `ADD@newLevel`, so the live (upgraded) file is dropped from the plan and **its rows are silently lost on read**. This diverges from Java, where `FileEntry.Identifier` includes `level` in `equals`/`hashCode`, and `AbstractFileStoreScan.readAndMergeFileEntries` nets add/delete by that full identifier — so Java is unaffected. **Manifest entries for an upgraded file (what the scan sees):** ``` ADD f.parquet level=0 (initial write) DELETE f.parquet level=0 (compaction) ADD f.parquet level=5 (same file upgraded in place) ADD g.parquet level=0 (later write) ``` Expected live set: `{f@L5, g@L0}`. Actual (buggy): `{g@L0}` — `f` lost entirely. **Minimal reproduction (real):** a primary-key table with a single `INSERT` (one sorted run), then `CALL sys.compact` (in-place upgrade), then read — the table reads back as if the compacted file does not exist (rows missing / empty). **Reproduction (hermetic unit test), reverting only the fix shows:** ``` test table::table_scan::tests::test_merge_manifest_entries_keeps_in_place_upgraded_file ... FAILED assertion `left == right` failed: upgraded file (f@L5) must survive; only f@L0 is cancelled by the DELETE left: [("g.parquet", 0)] right: [("f.parquet", 5), ("g.parquet", 0)] ``` Environment: apache/paimon-rust `main` (also present on current `main`). Affects any primary-key table that has undergone a single-run compaction (level upgrade). ### Solution Net add/delete by the **full file identity including `level`** (and the other fields of `FileEntry.Identifier`), mirroring Java `AbstractFileStoreScan.readAndMergeFileEntries`: collect every `DELETE` entry's `Identifier`, then keep `ADD`s whose `identifier()` is not deleted. The codebase already has a correct `ManifestEntry::identifier()` (incl. `level`) — the scan dedup just doesn't use it. Also align `Identifier`'s `Hash` impl with its `Eq` (the current `Hash` only hashes `partition/bucket/file_name`, so upgraded files all collide) to match Java `FileEntry.Identifier.hashCode`. ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
