TheR1sing3un opened a new pull request, #377:
URL: https://github.com/apache/paimon-rust/pull/377

   ### Purpose
   
   Linked issue: close #376
   
   When planning a scan, `merge_manifest_entries` 
(`crates/paimon/src/table/table_scan.rs`) nets manifest `ADD`/`DELETE` entries 
by `(partition, bucket, file_name)`, **omitting `level`**. A single-sorted-run 
compaction upgrades a file *in place* (Java `PojoDataFileMeta.upgrade` reuses 
the file name and only bumps `level`), emitting `DELETE f@oldLevel` + `ADD 
f@newLevel` with the same name. The level-less key lets the `DELETE` cancel the 
upgraded `ADD` too, so the live file is dropped from the plan and **its rows 
are silently lost on read**.
   
   This mirrors the fix to Java semantics: `FileEntry.Identifier` includes 
`level`, and `AbstractFileStoreScan.readAndMergeFileEntries` nets add/delete by 
that full identifier.
   
   ### Brief change log
   
   - Rewrite `merge_manifest_entries` to collect every `DELETE`'s full 
`Identifier` (incl. `level`) and keep `ADD`s whose `identifier()` is not 
deleted — reusing the existing `ManifestEntry::identifier()`. This matches Java 
`AbstractFileStoreScan.readAndMergeFileEntries` and is independent of 
add/delete ordering.
   - Align `Identifier`'s `Hash` impl with its `Eq` (and Java 
`FileEntry.Identifier.hashCode`): hash all identity fields (`partition, bucket, 
level, file_name, extra_files, embedded_index, external_path`). The previous 
`Hash` only hashed `partition/bucket/file_name`, so in-place upgraded files all 
collided in the dedup set.
   
   ### Tests
   
   Two hermetic unit tests (pure logic, no external deps):
   
   - 
`table_scan::tests::test_merge_manifest_entries_keeps_in_place_upgraded_file`: 
feeds `ADD f@L0, DELETE f@L0, ADD f@L5, ADD g@L0` and asserts the live set is 
`{f@L5, g@L0}`.
   - 
`manifest_entry::tests::test_identifier_distinguishes_in_place_level_upgrade`: 
`f@L0` and `f@L5` are unequal and do not alias in a `HashSet<Identifier>`.
   
   Before/after (revert the fix → the first test fails, restore → passes):
   ```
   # before fix
   test ... test_merge_manifest_entries_keeps_in_place_upgraded_file ... FAILED
     left: [("g.parquet", 0)]
    right: [("f.parquet", 5), ("g.parquet", 0)]
   # after fix
   test result: ok. 684 passed; 0 failed
   ```
   `cargo fmt --all -- --check` and `cargo clippy -p paimon --all-targets` are 
clean.
   
   ### API and Format
   
   No public API or storage-format changes.
   
   ### Documentation
   
   Not needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to