[jira] [Updated] (BEAM-673) Data locality for Read.Bounded

Amit Sela (JIRA) Thu, 02 Mar 2017 01:47:12 -0800

     [ 
https://issues.apache.org/jira/browse/BEAM-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Amit Sela updated BEAM-673:
---------------------------
    Description: 
In some distributed filesystems, such as HDFS, we should be able to hint to 
Spark the preferred locations of splits.
Here is an example of how Spark does that for Hadoop RDDs:
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L249

  was:
In some distributed filesystems, such as HDFS, we should be able to hint to 
Spark the preferred locations of splits.
Here is an example of how Spark does that for Hadoop RDDs:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L252

*Note: in case of 1-to-1 mapping of Read operation (e.g. TextIO) direct 
translation should still be preferred, but this is pending HDFS support for 
Beam anyway.*


> Data locality for Read.Bounded
> ------------------------------
>
>                 Key: BEAM-673
>                 URL: https://issues.apache.org/jira/browse/BEAM-673
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Amit Sela
>             Fix For: First stable release
>
>
> In some distributed filesystems, such as HDFS, we should be able to hint to 
> Spark the preferred locations of splits.
> Here is an example of how Spark does that for Hadoop RDDs:
> https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L249



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (BEAM-673) Data locality for Read.Bounded

Reply via email to