SHU WANG created SPARK-43987:
--------------------------------
Summary: Separate finalizeShuffleMerge Processing to Dedicated
Thread Pools
Key: SPARK-43987
URL: https://issues.apache.org/jira/browse/SPARK-43987
Project: Spark
Issue Type: Improvement
Components: Shuffle
Affects Versions: 3.4.0, 3.2.0
Reporter: SHU WANG
In our production environment, _finalizeShuffleMerge_ processing took longer
time (p90 is around 20s) than other PRC requests. This is due to
_finalizeShuffleMerge_ invoking IO operations like truncate and file
open/close.
More importantly, processing this _finalizeShuffleMerge_ can block other
critical lightweight messages like authentications, which can cause
authentication timeout as well as fetch failures. Those timeout and fetch
failures affect the stability of the Spark job executions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]