[
https://issues.apache.org/jira/browse/BEAM-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amit Sela updated BEAM-673:
---------------------------
Description:
In some distributed filesystems, such as HDFS, we should be able to hint to
Spark the preferred locations of splits.
Here is an example of how Spark does that for Hadoop RDDs:
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L249
was:
In some distributed filesystems, such as HDFS, we should be able to hint to
Spark the preferred locations of splits.
Here is an example of how Spark does that for Hadoop RDDs:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L252
*Note: in case of 1-to-1 mapping of Read operation (e.g. TextIO) direct
translation should still be preferred, but this is pending HDFS support for
Beam anyway.*
> Data locality for Read.Bounded
> ------------------------------
>
> Key: BEAM-673
> URL: https://issues.apache.org/jira/browse/BEAM-673
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Amit Sela
> Fix For: First stable release
>
>
> In some distributed filesystems, such as HDFS, we should be able to hint to
> Spark the preferred locations of splits.
> Here is an example of how Spark does that for Hadoop RDDs:
> https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L249
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)