Shardul Mahadik created SPARK-36215:
---------------------------------------
Summary: Add logging for slow fetches to diagnose external shuffle
service issues
Key: SPARK-36215
URL: https://issues.apache.org/jira/browse/SPARK-36215
Project: Spark
Issue Type: Improvement
Components: Shuffle
Affects Versions: 3.2.0
Reporter: Shardul Mahadik
Currently we can see from the metrics that a task or stage has slow fetches,
and the logs indicate _all_ of the shuffle servers those tasks were fetching
from, but often this is a big set (dozens or even hundreds) and narrowing down
which one caused issues can be very difficult. We should add some logging when
a fetch is "slow" as determined by some preconfigured thresholds.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]