[ 
https://issues.apache.org/jira/browse/SPARK-57057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-57057:
--------------------------------
    Description: 
Builds on SPARK-57056 (SupportsBranching DSv2 interface and branching
DDL). Once a data source can expose branches, users need a way to read
and write data against a specific branch.

This ticket adds two complementary mechanisms.

1. Per-query temporal clause

   Extend the existing temporalClause so it accepts a branch:

       SELECT * FROM t FOR BRANCH 'dev'
       SELECT * FROM t VERSION AS OF BRANCH 'dev'
       SELECT * FROM t SYSTEM_VERSION AS OF BRANCH 'dev'

       INSERT INTO t FOR BRANCH 'dev' SELECT ...
       INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ...
       INSERT INTO t FOR BRANCH 'dev' REPLACE WHERE / REPLACE USING ...

   BRANCH is the only temporal variant allowed on writes. VERSION AS OF
   <int> and TIMESTAMP AS OF <ts> on writes remain rejected (existing
   Spark constraint, surfaced at parse time with a clearer error).

2. Session default branch

   New config:

       spark.sql.defaultBranch

   When non-empty, every read and write against a SupportsBranching
   table is routed to the named branch. Tables that do not implement
   SupportsBranching silently ignore the config. An explicit FOR BRANCH
   clause always overrides the config.

Precedence:
  1. Explicit FOR BRANCH / VERSION AS OF BRANCH in the query.
  2. spark.sql.defaultBranch.
  3. Today's behaviour (no branch targeting).

Implementation notes:
  * SupportsBranching gains loadBranch(name): Table.
  * TimeTravelSpec gains AsOfBranch(branch, isExplicit). RelationTimeTravel
    carries an optional branch field.
  * UnresolvedRelation carries the branch on writes via a reserved
    internal option (mirrors REQUIRED_WRITE_PRIVILEGES), so the
    NamedRelation slot in InsertIntoStatement / OverwriteByExpression
    is preserved.
  * CatalogV2Util.getTable composes loadTable + loadBranch, lifting the
    "no time travel on writes" assertion only for the branch case.
  * The default branch is applied only on the persistent relation
    resolution path; temp views are unaffected.
  * InMemoryTable.loadBranch returns an independent InMemoryTable per
    branch so reads and writes are isolated end-to-end in tests.

> Allow SELECT and INSERT to target a specific branch on SupportsBranching 
> tables
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-57057
>                 URL: https://issues.apache.org/jira/browse/SPARK-57057
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>
> Builds on SPARK-57056 (SupportsBranching DSv2 interface and branching
> DDL). Once a data source can expose branches, users need a way to read
> and write data against a specific branch.
> This ticket adds two complementary mechanisms.
> 1. Per-query temporal clause
>    Extend the existing temporalClause so it accepts a branch:
>        SELECT * FROM t FOR BRANCH 'dev'
>        SELECT * FROM t VERSION AS OF BRANCH 'dev'
>        SELECT * FROM t SYSTEM_VERSION AS OF BRANCH 'dev'
>        INSERT INTO t FOR BRANCH 'dev' SELECT ...
>        INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ...
>        INSERT INTO t FOR BRANCH 'dev' REPLACE WHERE / REPLACE USING ...
>    BRANCH is the only temporal variant allowed on writes. VERSION AS OF
>    <int> and TIMESTAMP AS OF <ts> on writes remain rejected (existing
>    Spark constraint, surfaced at parse time with a clearer error).
> 2. Session default branch
>    New config:
>        spark.sql.defaultBranch
>    When non-empty, every read and write against a SupportsBranching
>    table is routed to the named branch. Tables that do not implement
>    SupportsBranching silently ignore the config. An explicit FOR BRANCH
>    clause always overrides the config.
> Precedence:
>   1. Explicit FOR BRANCH / VERSION AS OF BRANCH in the query.
>   2. spark.sql.defaultBranch.
>   3. Today's behaviour (no branch targeting).
> Implementation notes:
>   * SupportsBranching gains loadBranch(name): Table.
>   * TimeTravelSpec gains AsOfBranch(branch, isExplicit). RelationTimeTravel
>     carries an optional branch field.
>   * UnresolvedRelation carries the branch on writes via a reserved
>     internal option (mirrors REQUIRED_WRITE_PRIVILEGES), so the
>     NamedRelation slot in InsertIntoStatement / OverwriteByExpression
>     is preserved.
>   * CatalogV2Util.getTable composes loadTable + loadBranch, lifting the
>     "no time travel on writes" assertion only for the branch case.
>   * The default branch is applied only on the persistent relation
>     resolution path; temp views are unaffected.
>   * InMemoryTable.loadBranch returns an independent InMemoryTable per
>     branch so reads and writes are isolated end-to-end in tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to