[
https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003946#comment-14003946
]
Sandy Ryza commented on SPARK-1767:
-----------------------------------
That is the caching I am referring to. HDFS does expose the locations of
in-memory blocks in the same way it exposes the locations of on-disk blocks.
Unfortunately, I just realized that Spark gets this information through
org.apache.hadoop.mapred.InputSplit#getLocations, so I think we will need to
expose this information in MapReduce before we can expose it in Spark.
> Prefer HDFS-cached replicas when scheduling data-local tasks
> ------------------------------------------------------------
>
> Key: SPARK-1767
> URL: https://issues.apache.org/jira/browse/SPARK-1767
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.0.0
> Reporter: Sandy Ryza
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)