Shekharrajak opened a new pull request, #3002:
URL: https://github.com/apache/datafusion-comet/pull/3002

   ## Which issue does this PR close?
   
   
   Closes https://github.com/apache/datafusion-comet/issues/2889
   
   ## Rationale for this change
   
   The sql_hive-1 tests for Spark 4.0 were timing out (hanging indefinitely) 
when Comet was enabled. The last test shown in logs was 
HivePartitionFilteringSuite. Investigation showed that CometShuffleManager 
accessed SparkEnv.get.executorId during initialization via a lazy val, which 
could hang when SparkEnv wasn't fully initialized (e.g., during Hive metastore 
operations in Spark 4.0).
   This fix defers SparkEnv access until task execution (when 
getWriter()/getReader() is called), ensuring SparkEnv is available and 
preventing the hang.
   
   ## What changes are included in this PR?
   
   Changed shuffleExecutorComponents from a lazy val that accessed SparkEnv.get 
during construction to a @volatile variable with double-checked locking
   
   Added a null check with a clear error message if SparkEnv is unexpectedly 
null
   
   ## How are these changes tested?
   
   CI verification: Re-enabled sql_hive-1 tests for Spark 4.0 in the GitHub 
Actions workflow. These tests will run as part of the CI pipeline to verify the 
fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to