Matei Zaharia created SPARK-10008:
-------------------------------------
Summary: Shuffle locality can take precedence over narrow
dependencies for RDDs with both
Key: SPARK-10008
URL: https://issues.apache.org/jira/browse/SPARK-10008
Project: Spark
Issue Type: Bug
Components: Scheduler
Reporter: Matei Zaharia
The shuffle locality patch made the DAGScheduler aware of shuffle data, but for
RDDs that have both narrow and shuffle dependencies, it can cause them to place
tasks based on the shuffle dependency instead of the narrow one. This case is
common in iterative join-based algorithms like PageRank and ALS, where one RDD
is hash-partitioned and one isn't.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]