[
https://issues.apache.org/jira/browse/SPARK-57189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Norio Akagi updated SPARK-57189:
--------------------------------
Description:
For requests originating from Spark Declarative Pipelines (SDP),
SparkConnectPlanner.handleSqlCommand calls
PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
transformRelation(relation). When the relation is a WITH_RELATIONS
matching isValidSQLWithRefs, this transformation chain leads to:
transformRelation -> transformWithRelations -> transformSqlWithRefs
-> executeSQLWithRefs -> executeSQL -> session.sql(...)
executeSQLWithRefs explicitly comments "Eagerly execute commands of the
provided SQL string", and session.sql triggers actual execution of any
Command/DDL/DML in the root SQL. Commands embedded in reference
SubqueryAlias inputs also execute when eagerlyExecuteCommands walks the
resolved plan tree.
This causes two issues:
1. Bypassed guard. blockUnsupportedSqlCommand checks whether queryPlan
is a Command subclass (CreateTableAsSelect, InsertIntoStatement,
etc.). After execution, the resulting plan is wrapped as
CommandResult, which is not in the blocklist. The guard silently
lets through exactly the things it is supposed to block, and the
Commands have already mutated state by the time the guard runs.
2. Double execution. After the guard, handleSqlCommand falls through to
the normal execution path which calls executeSQLWithRefs again. Any
DDL/DML in the request runs twice, causing duplicate side effects.
The guard should match the runtime's execution surface: inspect both
the root SQL and each reference's input, without itself triggering any
execution.
was:
For requests originating from Spark Declarative Pipelines (SDP),
SparkConnectPlanner.handleSqlCommand calls
PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
transformRelation(relation). When the relation is a WITH_RELATIONS
matching isValidSQLWithRefs, this transformation chain leads to:
transformRelation -> transformWithRelations -> transformSqlWithRefs
-> executeSQLWithRefs -> executeSQL -> session.sql(...)
For requests originating from Spark Declarative Pipelines (SDP),
SparkConnectPlanner.handleSqlCommand calls
PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
transformRelation(relation). When the relation is a WITH_RELATIONS
matching isValidSQLWithRefs, this transformation chain leads to:
transformRelation -> transformWithRelations -> transformSqlWithRefs
-> executeSQLWithRefs -> executeSQL -> session.sql(...)
executeSQLWithRefs explicitly comments "Eagerly execute commands of the
provided SQL string", and session.sql triggers actual execution of any
Command/DDL/DML in the root SQL. Commands embedded in reference
SubqueryAlias inputs also execute when eagerlyExecuteCommands walks the
resolved plan tree.
This causes two issues:
1. Bypassed guard. blockUnsupportedSqlCommand checks whether queryPlan
is a Command subclass (CreateTableAsSelect, InsertIntoStatement,
etc.). After execution, the resulting plan is wrapped as
CommandResult, which is not in the blocklist. The guard silently
lets through exactly the things it is supposed to block, and the
Commands have already mutated state by the time the guard runs.
2. Double execution. After the guard, handleSqlCommand falls through to
the normal execution path which calls executeSQLWithRefs again. Any
DDL/DML in the request runs twice, causing duplicate side effects.
The guard should match the runtime's execution surface: inspect both
the root SQL and each reference's input, without itself triggering any
execution.
> handleSqlCommand executes SQL twice and lets blocked Commands bypass the SDP
> guard for WITH_RELATIONS
> -----------------------------------------------------------------------------------------------------
>
> Key: SPARK-57189
> URL: https://issues.apache.org/jira/browse/SPARK-57189
> Project: Spark
> Issue Type: Bug
> Components: Connect
> Affects Versions: 5.0.0
> Reporter: Norio Akagi
> Priority: Major
>
> For requests originating from Spark Declarative Pipelines (SDP),
> SparkConnectPlanner.handleSqlCommand calls
> PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
> transformRelation(relation). When the relation is a WITH_RELATIONS
> matching isValidSQLWithRefs, this transformation chain leads to:
> transformRelation -> transformWithRelations -> transformSqlWithRefs
> -> executeSQLWithRefs -> executeSQL -> session.sql(...)
> executeSQLWithRefs explicitly comments "Eagerly execute commands of the
> provided SQL string", and session.sql triggers actual execution of any
> Command/DDL/DML in the root SQL. Commands embedded in reference
> SubqueryAlias inputs also execute when eagerlyExecuteCommands walks the
> resolved plan tree.
> This causes two issues:
> 1. Bypassed guard. blockUnsupportedSqlCommand checks whether queryPlan
> is a Command subclass (CreateTableAsSelect, InsertIntoStatement,
> etc.). After execution, the resulting plan is wrapped as
> CommandResult, which is not in the blocklist. The guard silently
> lets through exactly the things it is supposed to block, and the
> Commands have already mutated state by the time the guard runs.
> 2. Double execution. After the guard, handleSqlCommand falls through to
> the normal execution path which calls executeSQLWithRefs again. Any
> DDL/DML in the request runs twice, causing duplicate side effects.
> The guard should match the runtime's execution surface: inspect both
> the root SQL and each reference's input, without itself triggering any
> execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]