[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

Mridul Muralidharan (JIRA) Sun, 18 May 2014 21:19:31 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001376#comment-14001376
 ]


Mridul Muralidharan commented on SPARK-1767:
--------------------------------------------


Did not realize that mail replies to JIRA mails did not get mirrored to JIRA ! 
Replicating my mail here :

-- cut and paste --
Hi Sandy,

  I assume you are referring to caching added to datanodes via new caching api 
via NN ? (To preemptively mmap blocks).

I have not looked in detail, but does NN tell us about this in block locations?
If yes, we can simply make those process local instead of node local for 
executors on that node.

This would simply be a change to hadoop based rdd partitioning (what makes it 
tricky is to expose currently 'alive' executors to partition)

Thanks
Mridul

> Prefer HDFS-cached replicas when scheduling data-local tasks
> ------------------------------------------------------------
>
>                 Key: SPARK-1767
>                 URL: https://issues.apache.org/jira/browse/SPARK-1767
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

Reply via email to