SHU WANG created SPARK-43987:
--------------------------------

             Summary: Separate finalizeShuffleMerge Processing to Dedicated 
Thread Pools
                 Key: SPARK-43987
                 URL: https://issues.apache.org/jira/browse/SPARK-43987
             Project: Spark
          Issue Type: Improvement
          Components: Shuffle
    Affects Versions: 3.4.0, 3.2.0
            Reporter: SHU WANG


In our production environment, _finalizeShuffleMerge_ processing took longer 
time (p90 is around 20s) than other PRC requests. This is due to 
_finalizeShuffleMerge_ invoking IO operations like truncate and file 
open/close.  

More importantly, processing this _finalizeShuffleMerge_ can block other 
critical lightweight messages like authentications, which can cause 
authentication timeout as well as fetch failures. Those timeout and fetch 
failures affect the stability of the Spark job executions. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to