[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

Sandy Ryza (JIRA) Tue, 20 May 2014 13:40:30 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003946#comment-14003946
 ]


Sandy Ryza commented on SPARK-1767:
-----------------------------------

That is the caching I am referring to.  HDFS does expose the locations of 
in-memory blocks in the same way it exposes the locations of on-disk blocks.

Unfortunately, I just realized that Spark gets this information through 
org.apache.hadoop.mapred.InputSplit#getLocations, so I think we will need to 
expose this information in MapReduce before we can expose it in Spark.


> Prefer HDFS-cached replicas when scheduling data-local tasks
> ------------------------------------------------------------
>
>                 Key: SPARK-1767
>                 URL: https://issues.apache.org/jira/browse/SPARK-1767
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

Reply via email to