[
https://issues.apache.org/jira/browse/SPARK-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698071#comment-14698071
]
Apache Spark commented on SPARK-10008:
--------------------------------------
User 'mateiz' has created a pull request for this issue:
https://github.com/apache/spark/pull/8220
> Shuffle locality can take precedence over narrow dependencies for RDDs with
> both
> --------------------------------------------------------------------------------
>
> Key: SPARK-10008
> URL: https://issues.apache.org/jira/browse/SPARK-10008
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Reporter: Matei Zaharia
>
> The shuffle locality patch made the DAGScheduler aware of shuffle data, but
> for RDDs that have both narrow and shuffle dependencies, it can cause them to
> place tasks based on the shuffle dependency instead of the narrow one. This
> case is common in iterative join-based algorithms like PageRank and ALS,
> where one RDD is hash-partitioned and one isn't.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]