TheR1sing3un opened a new pull request, #7802:
URL: https://github.com/apache/paimon/pull/7802

   ## Purpose
   
   Add time-travel support to the top-level ``pypaimon.ray.read_paimon`` API,
   so a Ray scan can read a specific snapshot id or a named tag.
   
   ## Why
   
   Before this PR, ``read_paimon`` always read the latest snapshot — there
   was no way to reproduce a scan against a fixed point in history through
   the recommended public facade. Internally pypaimon already understood
   ``scan.tag-name`` (added with #7243), but the matching ``scan.snapshot-id``
   plumbing was missing on the Python side even though the option exists in
   Java's ``CoreOptions.SCAN_SNAPSHOT_ID``.
   
   ## What changed
   
   **Public API** — ``pypaimon/ray/ray_paimon.py``:
   - ``read_paimon(..., snapshot_id=None, tag_name=None)``
   - Both are keyword-only and mutually exclusive (``ValueError`` if both set)
   
   **Backing plumbing**:
   - ``pypaimon/read/datasource/split_provider.py``: ``CatalogSplitProvider``
     takes the two new fields, applies them via 
``table.copy({"scan.snapshot-id":
     ..., "scan.tag-name": ...})`` in ``_ensure_table``. Same mutual-exclusion
     guard as a defense-in-depth layer.
   - ``pypaimon/common/options/core_options.py``: new ``SCAN_SNAPSHOT_ID``
     config (long type, no default), aligned with Java's
     ``CoreOptions.SCAN_SNAPSHOT_ID``.
   - ``pypaimon/snapshot/time_travel_util.py``: ``try_travel_to_snapshot`` now
     accepts a ``snapshot_manager`` and resolves ``scan.snapshot-id`` against 
it.
   - ``pypaimon/read/table_scan.py``: ``_create_file_scanner`` routes
     ``SCAN_SNAPSHOT_ID`` through ``snapshot_manager.get_snapshot_by_id`` +
     ``manifest_list_manager.read_all``, mirroring the existing
     ``SCAN_TAG_NAME`` branch.
   - Existing ``TimeTravelUtil`` callers (full-text scan, vector search scan)
     are updated to pass the snapshot manager.
   
   **Docs** — ``docs/content/pypaimon/ray-data.md``: added a ``Time travel``
   example block and parameter docs.
   
   ## Tests
   
   - ``time_travel_util_test.py`` (new, 6 cases): SCAN_KEYS contents,
     snapshot-id resolution, missing-id raise, missing-snapshot-manager raise,
     mutual exclusion at the util layer.
   - ``split_provider_test.py`` (+3 cases): provider-level snapshot-id /
     tag-name time travel + ctor mutual-exclusion guard.
   - ``ray_integration_test.py`` (+3 cases): ``read_paimon`` end-to-end with
     ``snapshot_id`` / ``tag_name``, plus the public mutual-exclusion guard.
   
   All read-path regression tests still pass (57/57 across reader-pk,
   reader-append-only, projection, time-travel, ray integration).
   
   ## Out of scope
   
   - Streaming time travel (``scan.mode=from-snapshot`` etc.) is unchanged.
   - Java side already has ``CoreOptions.SCAN_SNAPSHOT_ID``; no Java changes
     needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to