[ 
https://issues.apache.org/jira/browse/SPARK-55505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-55505:
-----------------------------------
    Labels: pull-request-available  (was: )

> NumberFormatException in SQLExecution.withNewExecutionId0 due to re-reading 
> EXECUTION_ROOT_ID_KEY
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-55505
>                 URL: https://issues.apache.org/jira/browse/SPARK-55505
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Yicong Huang
>            Priority: Minor
>              Labels: pull-request-available
>
> {{SQLExecution.withNewExecutionId0}} can throw {{NumberFormatException: 
> Cannot parse null string}} at the line:
> {code:scala}
> val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong
> {code}
> The current code checks if {{EXECUTION_ROOT_ID_KEY}} is null, sets it if so, 
> then *re-reads* it from local properties assuming it is non-null:
> {code:scala}
> if (sc.getLocalProperty(EXECUTION_ROOT_ID_KEY) == null) {
>   sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId.toString)
>   sc.addJobTag(executionIdJobTag(sparkSession, executionId))
> }
> val rootExecutionId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY).toLong // 
> crashes here
> {code}
> This re-read can return null under high-concurrency scenarios involving 
> nested thread pools (e.g., {{CrossValidator(parallelism=4)}} with 
> {{OneVsRest(parallelism=2)}} running from a Python {{ThreadPoolExecutor}}).
> The fix is to read the property once and use the value directly, avoiding the 
> re-read:
> {code:scala}
> val existingRootId = sc.getLocalProperty(EXECUTION_ROOT_ID_KEY)
> val rootExecutionId = if (existingRootId != null) {
>   existingRootId.toLong
> } else {
>   sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId.toString)
>   sc.addJobTag(executionIdJobTag(sparkSession, executionId))
>   executionId
> }
> {code}
> CI failure demonstrating the flaky test: [GitHub Actions 
> Run|https://github.com/Yicong-Huang/spark/actions/runs/21961599500/attempts/1]
> (Attempt 1 failed, attempt 2 passed. Failed test: 
> {{test_save_load_pipeline_estimator}} in {{CrossValidatorIOPipelineTests}})
> Full stack trace:
> {code}
> java.lang.NumberFormatException: Cannot parse null string
>   at java.lang.Long.parseLong(Long.java:550)
>   at SQLExecution.scala:115
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to