TheR1sing3un opened a new pull request, #379: URL: https://github.com/apache/paimon-rust/pull/379
### Purpose Time travel (`scan.version` / `scan.timestamp-millis` / SQL `VERSION AS OF` / `TIMESTAMP AS OF`) previously only switched which snapshot was scanned, while the table schema, scan pruning, the read evolution target, and the DataFusion provider all kept using the **latest** schema — `Snapshot.schemaId` was never consumed on the read path. Reading an old snapshot therefore lost its historical shape: columns dropped later were invisible, and type updates were applied retroactively to historical data. Java switches the table to the snapshot's schema in `AbstractFileStoreTable.copy(dynamicOptions)` → `tryTimeTravel` → `schemaManager.schema(snapshot.schemaId()).copy(mergedOptions)`. This PR mirrors that behavior. ### Brief change log - Add async `Table::copy_with_time_travel(extra)`, mirroring Java `copy(dynamicOptions)`: merge options, resolve the time-travel selector, and when the resolved snapshot has a different schema id, replace the table schema with the snapshot's schema while keeping the merged options. Resolution failures fall back silently (Java `tryTimeTravel` catch-all); invalid selectors still fail at scan planning, so existing error behavior is unchanged. The existing `Table::copy_with_options` keeps its non-traveling semantics (Java `copyWithoutTimeTravel`). - Add `TableSchema::copy_with_replaced_options`, matching Java `TableSchema.copy(Map)` (options are replaced, not merged; id/fields/keys/comment/timeMillis preserved). - Extract snapshot resolution from `TableScan::resolve_snapshot` into `table::time_travel::travel_to_snapshot` (Java `TimeTravelUtil`) and share it. - Wire up the DataFusion entry points: the `SQLContext` time-travel rewrite path, the catalog provider's dynamic-options path (`SET 'paimon.scan.version'`), and `PaimonRelationPlanner` (bridged via the existing `block_on_with_runtime`, since the planner hook is synchronous). - Reject `new_write()` on a time-travelled table copy. Java has no runtime guard but avoids the situation structurally — write paths always use `copyWithoutTimeTravel` — whereas the shared DataFusion provider here can serve both reads and INSERT, so an explicit error is safer than silently writing data shaped like the old schema. This is isolated and easy to drop if undesired. Because the schema switch happens at the `Table` level, scan stats pruning, the per-file evolution target, and the provider's Arrow schema stay consistent automatically; the existing field-id based stats devolution is unaffected (files in an old snapshot always have `schema_id <=` the snapshot's schema id). ### Tests - New unit tests in `table::time_travel` (multi-schema fixture built by persisting `schema-0`/`schema-1` and committing one snapshot per version): schema switch by version/tag/timestamp, no-selector no-op, silent fallback for invalid/conflicting selectors, merged-options replacement semantics, write rejection, and an end-to-end read asserting the old snapshot returns only the old columns. - New `time_travel_schema_tests` in paimon-datafusion: `VERSION AS OF` (SQLContext path and the relation-planner path on a raw `SessionContext`), `TIMESTAMP AS OF`, `SET 'paimon.scan.version'` + SELECT/INSERT, and that selecting a later-added column at an old snapshot fails at planning. - Existing time-travel tests (conflicting/invalid selector behavior) pass unchanged. ### API and Format New public APIs: `Table::copy_with_time_travel`, `Table::is_time_traveled`, `TableSchema::copy_with_replaced_options`. No storage format change. Scope notes (deliberate, follow-ups welcome): - Selector coverage stays the existing Rust subset (`scan.version`, `scan.timestamp-millis`); Java additionally supports `scan.snapshot-id` / `scan.tag-name` / `scan.watermark` / `scan.timestamp`. - `FileSystemCatalog::get_table` does not auto-travel selectors persisted in table options (Java `FileStoreTableFactory.create` does); only dynamic entry points are covered here. ### Documentation None required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
