Yicong Huang created SPARK-55505:
------------------------------------

             Summary: NumberFormatException in SQLExecution.withNewExecutionId0 
due to re-reading EXECUTION_ROOT_ID_KEY
                 Key: SPARK-55505
                 URL: https://issues.apache.org/jira/browse/SPARK-55505
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Yicong Huang


{{SQLExecution.withNewExecutionId0}} can throw {{NumberFormatException: Cannot 
parse null string}} at the line:
{code:scala}
val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong
{code}

The current code checks if {{EXECUTION_ROOT_ID_KEY}} is null, sets it if so, 
then *re-reads* it from local properties assuming it is non-null:
{code:scala}
if (sc.getLocalProperty(EXECUTION_ROOT_ID_KEY) == null) {
  sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId.toString)
  sc.addJobTag(executionIdJobTag(sparkSession, executionId))
}
val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong // 
crashes here
{code}

This re-read can return null under high-concurrency scenarios involving nested 
thread pools (e.g., {{CrossValidator(parallelism=4)}} with 
{{OneVsRest(parallelism=2)}} running from a Python {{ThreadPoolExecutor}}).

The fix is to read the property once and use the value directly, avoiding the 
re-read:
{code:scala}
val existingRootId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY)
val rootExecutionId = if (existingRootId != null) {
  existingRootId.toLong
} else {
  sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId.toString)
  sc.addJobTag(executionIdJobTag(sparkSession, executionId))
  executionId
}
{code}

CI failure demonstrating the flaky test: [GitHub Actions 
Run|https://github.com/Yicong-Huang/spark/actions/runs/21961599500/attempts/1]
(Attempt 1 failed, attempt 2 passed. Failed test: 
{{test_save_load_pipeline_estimator}} in {{CrossValidatorIOPipelineTests}})

Full stack trace:
{code}
java.lang.NumberFormatException: Cannot parse null string
  at java.lang.Long.parseLong(Long.java:550)
  at SQLExecution.scala:115
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to