Juliusz Sompolski created SPARK-44835:
-----------------------------------------
Summary: SparkConnect ReattachExecute could raise before
ExecutePlan even attaches.
Key: SPARK-44835
URL: https://issues.apache.org/jira/browse/SPARK-44835
Project: Spark
Issue Type: Improvement
Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski
If a ReattachExecute is sent very quickly after ExecutePlan, the following
could happen:
* ExecutePlan didn't reach
*executeHolder.runGrpcResponseSender(responseSender)* in
SparkConnectExecutePlanHandler yet.
* ReattachExecute races around and reaches
*executeHolder.runGrpcResponseSender(responseSender)* in
SparkConnectReattachExecuteHandler first.
* When ExecutePlan reaches
{*}executeHolder.runGrpcResponseSender(responseSender){*}, and
executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender
of ExecutePlan, it will kick out the ExecuteGrpcResponseSender or
ReattachExecute.
So even though ReattachExecute came later, it will get interrupted by the
earlier ExecutePlan and finish with a *SparkSQLException(errorClass =
"INVALID_CURSOR.DISCONNECTED", Map.empty)* (which was assumed to be a situation
where a stale hanging RPC is replaced by a reconnection.
That would be very unlikely to happen in practice, because ExecutePlan
shouldn't be abandoned so fast, but because of
https://issues.apache.org/jira/browse/SPARK-44833 it is slightly more likely
(though there there is also a 50ms sleep before retry, which again make it
unlikely)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]