[ 
https://issues.apache.org/jira/browse/SPARK-57189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norio Akagi updated SPARK-57189:
--------------------------------
    Description: 
  For requests originating from Spark Declarative Pipelines (SDP),
  SparkConnectPlanner.handleSqlCommand calls
  PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
  transformRelation(relation). When the relation is a WITH_RELATIONS
  matching isValidSQLWithRefs, this transformation chain leads to:

    transformRelation -> transformWithRelations -> transformSqlWithRefs
      -> executeSQLWithRefs -> executeSQL -> session.sql(...)

  executeSQLWithRefs explicitly comments "Eagerly execute commands of the
  provided SQL string", and session.sql triggers actual execution of any
  Command/DDL/DML in the root SQL. Commands embedded in reference
  SubqueryAlias inputs also execute when eagerlyExecuteCommands walks the
  resolved plan tree.

  This causes two issues:

  1. Bypassed guard. blockUnsupportedSqlCommand checks whether queryPlan
     is a Command subclass (CreateTableAsSelect, InsertIntoStatement,
     etc.). After execution, the resulting plan is wrapped as
     CommandResult, which is not in the blocklist. The guard silently
     lets through exactly the things it is supposed to block, and the
     Commands have already mutated state by the time the guard runs.

  2. Double execution. After the guard, handleSqlCommand falls through to
     the normal execution path which calls executeSQLWithRefs again. Any
     DDL/DML in the request runs twice, causing duplicate side effects.

  The guard should match the runtime's execution surface: inspect both
  the root SQL and each reference's input, without itself triggering any
  execution.


  was:
  For requests originating from Spark Declarative Pipelines (SDP),
  SparkConnectPlanner.handleSqlCommand calls
  PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
  transformRelation(relation). When the relation is a WITH_RELATIONS
  matching isValidSQLWithRefs, this transformation chain leads to:

    transformRelation -> transformWithRelations -> transformSqlWithRefs
      -> executeSQLWithRefs -> executeSQL -> session.sql(...)
  For requests originating from Spark Declarative Pipelines (SDP),
  SparkConnectPlanner.handleSqlCommand calls
  PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
  transformRelation(relation). When the relation is a WITH_RELATIONS
  matching isValidSQLWithRefs, this transformation chain leads to:

    transformRelation -> transformWithRelations -> transformSqlWithRefs
      -> executeSQLWithRefs -> executeSQL -> session.sql(...)

  executeSQLWithRefs explicitly comments "Eagerly execute commands of the
  provided SQL string", and session.sql triggers actual execution of any
  Command/DDL/DML in the root SQL. Commands embedded in reference
  SubqueryAlias inputs also execute when eagerlyExecuteCommands walks the
  resolved plan tree.

  This causes two issues:

  1. Bypassed guard. blockUnsupportedSqlCommand checks whether queryPlan
     is a Command subclass (CreateTableAsSelect, InsertIntoStatement,
     etc.). After execution, the resulting plan is wrapped as
     CommandResult, which is not in the blocklist. The guard silently
     lets through exactly the things it is supposed to block, and the
     Commands have already mutated state by the time the guard runs.

  2. Double execution. After the guard, handleSqlCommand falls through to
     the normal execution path which calls executeSQLWithRefs again. Any
     DDL/DML in the request runs twice, causing duplicate side effects.

  The guard should match the runtime's execution surface: inspect both
  the root SQL and each reference's input, without itself triggering any
  execution.



> handleSqlCommand executes SQL twice and lets blocked Commands bypass the SDP 
> guard for WITH_RELATIONS
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57189
>                 URL: https://issues.apache.org/jira/browse/SPARK-57189
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect
>    Affects Versions: 5.0.0
>            Reporter: Norio Akagi
>            Priority: Major
>
>   For requests originating from Spark Declarative Pipelines (SDP),
>   SparkConnectPlanner.handleSqlCommand calls
>   PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
>   transformRelation(relation). When the relation is a WITH_RELATIONS
>   matching isValidSQLWithRefs, this transformation chain leads to:
>     transformRelation -> transformWithRelations -> transformSqlWithRefs
>       -> executeSQLWithRefs -> executeSQL -> session.sql(...)
>   executeSQLWithRefs explicitly comments "Eagerly execute commands of the
>   provided SQL string", and session.sql triggers actual execution of any
>   Command/DDL/DML in the root SQL. Commands embedded in reference
>   SubqueryAlias inputs also execute when eagerlyExecuteCommands walks the
>   resolved plan tree.
>   This causes two issues:
>   1. Bypassed guard. blockUnsupportedSqlCommand checks whether queryPlan
>      is a Command subclass (CreateTableAsSelect, InsertIntoStatement,
>      etc.). After execution, the resulting plan is wrapped as
>      CommandResult, which is not in the blocklist. The guard silently
>      lets through exactly the things it is supposed to block, and the
>      Commands have already mutated state by the time the guard runs.
>   2. Double execution. After the guard, handleSqlCommand falls through to
>      the normal execution path which calls executeSQLWithRefs again. Any
>      DDL/DML in the request runs twice, causing duplicate side effects.
>   The guard should match the runtime's execution surface: inspect both
>   the root SQL and each reference's input, without itself triggering any
>   execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to