dzcxzl created SPARK-33158:
------------------------------
Summary: Check whether the executor and external service
connection is available
Key: SPARK-33158
URL: https://issues.apache.org/jira/browse/SPARK-33158
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.0.1
Reporter: dzcxzl
At present, the executor only establishes a connection with the external
shuffle service once at initialization and registers.
In yarn, nodemanager may stop working, shuffle service does not work, but the
container/executor process is still executing, ShuffleMapTask can still be
executed, and the returned result mapstatus is still the address of the
external shuffle service
When the next stage reads shuffle data, it will not be connected to the shuffle
serivce.
The final job execution failed.
The approach I thought of:
Before ShuffleMapTask starts to write data, check whether the connection is
available, or regularly test whether the connection is normal, such as the
driver and executor heartbeat check threads.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]