[ https://issues.apache.org/jira/browse/SPARK-53339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-53339: ----------------------------------- Labels: pull-request-available (was: ) > Fix a race condition issue which occurs when an operation in pending state is > interrupted > ----------------------------------------------------------------------------------------- > > Key: SPARK-53339 > URL: https://issues.apache.org/jira/browse/SPARK-53339 > Project: Spark > Issue Type: Bug > Components: Connect > Affects Versions: 4.1.0 > Reporter: Kousuke Saruta > Priority: Major > Labels: pull-request-available > > When an operation in pending state is concurrently interrupted, the > interruption doesn't work correctly. > You can easily reproduce this issue by modifying > SparkConnectExecutionManager#createExecuteHolderAndAttach like as follows. > {code} > val executeHolder = createExecuteHolder(executeKey, request, > sessionHolder) > try { > + Thread.sleep(1000) > executeHolder.eventsManager.postStarted() > executeHolder.start() > } catch { > {code} > And then run a test "interrupt all - background queries, foreground > interrupt" in SparkSessionE2ESuite. > {code} > $ build/sbt 'connect-client-jvm/testOnly > org.apache.spark.sql.connect.SparkSessionE2ESuite -- -z "interrupt all - > background queries, foreground interrupt"' > {code} > You will see the following error. > {code} > [info] - interrupt all - background queries, foreground interrupt *** FAILED > *** (20 seconds, 344 milliseconds) > [info] The code passed to eventually never returned normally. Attempted 28 > times over 20.285258458 seconds. Last failure message: Some("unexpected > failure in q2: org.apache.spark.SparkException: > java.lang.IllegalStateException: Operation was orphaned because of an > internal error.") was not empty Error not empty: Some(unexpected failure in > q2: org.apache.spark.SparkException: java.lang.IllegalStateException: > Operation was orphaned because of an internal error.). > (SparkSessionE2ESuite.scala:72) > [info] org.scalatest.exceptions.TestFailedDueToTimeoutException: > [info] at > org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:219) > [info] at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226) > [info] at > org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313) > [info] at > org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312) > [info] at > org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457) > [info] at > org.apache.spark.sql.connect.SparkSessionE2ESuite.$anonfun$new$1(SparkSessionE2ESuite.scala:72) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org