TheR1sing3un opened a new issue, #376:
URL: https://github.com/apache/paimon-rust/issues/376

   ### Search before asking
   
   - [x] I searched in the 
[issues](https://github.com/apache/paimon-rust/issues) and found nothing 
similar.
   
   ### Please describe the bug 🐞
   
   When planning a scan, `merge_manifest_entries` in 
`crates/paimon/src/table/table_scan.rs` nets manifest `ADD`/`DELETE` entries 
using the identity tuple `(partition, bucket, file_name)` — it **omits 
`level`**:
   
   ```rust
   let deleted_keys: HashSet<(&[u8], i32, &str)> = delete_entries
       .iter()
       .map(|e| (e.partition(), e.bucket(), e.file().file_name.as_str()))
       .collect();
   // keep adds whose (partition, bucket, file_name) is not in deleted_keys
   ```
   
   A single-sorted-run compaction upgrades a file **in place**: Java 
`PojoDataFileMeta.upgrade(newLevel)` reuses the same `fileName` and only bumps 
`level`, so the commit emits `DELETE f@oldLevel` + `ADD f@newLevel` with the 
**same file name**. Because the dedup key ignores `level`, the 
`DELETE@oldLevel` cancels **both** the old add and the upgraded `ADD@newLevel`, 
so the live (upgraded) file is dropped from the plan and **its rows are 
silently lost on read**.
   
   This diverges from Java, where `FileEntry.Identifier` includes `level` in 
`equals`/`hashCode`, and `AbstractFileStoreScan.readAndMergeFileEntries` nets 
add/delete by that full identifier — so Java is unaffected.
   
   **Manifest entries for an upgraded file (what the scan sees):**
   
   ```
   ADD    f.parquet  level=0   (initial write)
   DELETE f.parquet  level=0   (compaction)
   ADD    f.parquet  level=5   (same file upgraded in place)
   ADD    g.parquet  level=0   (later write)
   ```
   Expected live set: `{f@L5, g@L0}`. Actual (buggy): `{g@L0}` — `f` lost 
entirely.
   
   **Minimal reproduction (real):** a primary-key table with a single `INSERT` 
(one sorted run), then `CALL sys.compact` (in-place upgrade), then read — the 
table reads back as if the compacted file does not exist (rows missing / empty).
   
   **Reproduction (hermetic unit test), reverting only the fix shows:**
   ```
   test 
table::table_scan::tests::test_merge_manifest_entries_keeps_in_place_upgraded_file
 ... FAILED
   assertion `left == right` failed: upgraded file (f@L5) must survive; only 
f@L0 is cancelled by the DELETE
     left: [("g.parquet", 0)]
    right: [("f.parquet", 5), ("g.parquet", 0)]
   ```
   
   Environment: apache/paimon-rust `main` (also present on current `main`). 
Affects any primary-key table that has undergone a single-run compaction (level 
upgrade).
   
   ### Solution
   
   Net add/delete by the **full file identity including `level`** (and the 
other fields of `FileEntry.Identifier`), mirroring Java 
`AbstractFileStoreScan.readAndMergeFileEntries`: collect every `DELETE` entry's 
`Identifier`, then keep `ADD`s whose `identifier()` is not deleted. The 
codebase already has a correct `ManifestEntry::identifier()` (incl. `level`) — 
the scan dedup just doesn't use it. Also align `Identifier`'s `Hash` impl with 
its `Eq` (the current `Hash` only hashes `partition/bucket/file_name`, so 
upgraded files all collide) to match Java `FileEntry.Identifier.hashCode`.
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to