Josh Rosen created SPARK-20715:
----------------------------------
Summary: MapStatuses shouldn't be redundantly stored in both
ShuffleMapStage and MapOutputTracker
Key: SPARK-20715
URL: https://issues.apache.org/jira/browse/SPARK-20715
Project: Spark
Issue Type: Improvement
Components: Scheduler, Shuffle
Affects Versions: 2.3.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Today the MapOutputTracker and ShuffleMapStage both maintain their own copies
of MapStatuses. This creates the potential for bugs in case these two pieces of
state become out of sync.
I believe that we can improve our ability to reason about the code by storing
this information only in the MapOutputTracker. This can also help to reduce
driver memory consumption.
I will provide more details in my PR, where I'll walk through the detailed
arguments as to why we can take these two different metadata tracking formats
and consolidate without loss of performance or correctness.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]