[PR] fix(table): use the snapshot's schema version for time travel reads [paimon-rust]

via GitHub Thu, 11 Jun 2026 09:03:33 -0700


TheR1sing3un opened a new pull request, #379:
URL: https://github.com/apache/paimon-rust/pull/379


   ### Purpose
   
   Time travel (`scan.version` / `scan.timestamp-millis` / SQL `VERSION AS OF` 
/ `TIMESTAMP AS OF`) previously only switched which snapshot was scanned, while 
the table schema, scan pruning, the read evolution target, and the DataFusion 
provider all kept using the **latest** schema — `Snapshot.schemaId` was never 
consumed on the read path. Reading an old snapshot therefore lost its 
historical shape: columns dropped later were invisible, and type updates were 
applied retroactively to historical data.
   
   Java switches the table to the snapshot's schema in 
`AbstractFileStoreTable.copy(dynamicOptions)` → `tryTimeTravel` → 
`schemaManager.schema(snapshot.schemaId()).copy(mergedOptions)`. This PR 
mirrors that behavior.
   
   ### Brief change log
   
   - Add async `Table::copy_with_time_travel(extra)`, mirroring Java 
`copy(dynamicOptions)`: merge options, resolve the time-travel selector, and 
when the resolved snapshot has a different schema id, replace the table schema 
with the snapshot's schema while keeping the merged options. Resolution 
failures fall back silently (Java `tryTimeTravel` catch-all); invalid selectors 
still fail at scan planning, so existing error behavior is unchanged. The 
existing `Table::copy_with_options` keeps its non-traveling semantics (Java 
`copyWithoutTimeTravel`).
   - Add `TableSchema::copy_with_replaced_options`, matching Java 
`TableSchema.copy(Map)` (options are replaced, not merged; 
id/fields/keys/comment/timeMillis preserved).
   - Extract snapshot resolution from `TableScan::resolve_snapshot` into 
`table::time_travel::travel_to_snapshot` (Java `TimeTravelUtil`) and share it.
   - Wire up the DataFusion entry points: the `SQLContext` time-travel rewrite 
path, the catalog provider's dynamic-options path (`SET 
'paimon.scan.version'`), and `PaimonRelationPlanner` (bridged via the existing 
`block_on_with_runtime`, since the planner hook is synchronous).
   - Reject `new_write()` on a time-travelled table copy. Java has no runtime 
guard but avoids the situation structurally — write paths always use 
`copyWithoutTimeTravel` — whereas the shared DataFusion provider here can serve 
both reads and INSERT, so an explicit error is safer than silently writing data 
shaped like the old schema. This is isolated and easy to drop if undesired.
   
   Because the schema switch happens at the `Table` level, scan stats pruning, 
the per-file evolution target, and the provider's Arrow schema stay consistent 
automatically; the existing field-id based stats devolution is unaffected 
(files in an old snapshot always have `schema_id <=` the snapshot's schema id).
   
   ### Tests
   
   - New unit tests in `table::time_travel` (multi-schema fixture built by 
persisting `schema-0`/`schema-1` and committing one snapshot per version): 
schema switch by version/tag/timestamp, no-selector no-op, silent fallback for 
invalid/conflicting selectors, merged-options replacement semantics, write 
rejection, and an end-to-end read asserting the old snapshot returns only the 
old columns.
   - New `time_travel_schema_tests` in paimon-datafusion: `VERSION AS OF` 
(SQLContext path and the relation-planner path on a raw `SessionContext`), 
`TIMESTAMP AS OF`, `SET 'paimon.scan.version'` + SELECT/INSERT, and that 
selecting a later-added column at an old snapshot fails at planning.
   - Existing time-travel tests (conflicting/invalid selector behavior) pass 
unchanged.
   
   ### API and Format
   
   New public APIs: `Table::copy_with_time_travel`, `Table::is_time_traveled`, 
`TableSchema::copy_with_replaced_options`. No storage format change.
   
   Scope notes (deliberate, follow-ups welcome):
   - Selector coverage stays the existing Rust subset (`scan.version`, 
`scan.timestamp-millis`); Java additionally supports `scan.snapshot-id` / 
`scan.tag-name` / `scan.watermark` / `scan.timestamp`.
   - `FileSystemCatalog::get_table` does not auto-travel selectors persisted in 
table options (Java `FileStoreTableFactory.create` does); only dynamic entry 
points are covered here.
   
   ### Documentation
   
   None required.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix(table): use the snapshot's schema version for time travel reads [paimon-rust]

Reply via email to