[
https://issues.apache.org/jira/browse/SPARK-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-15352:
------------------------------------
Assignee: Apache Spark (was: Shubham Chopra)
> Topology aware block replication
> --------------------------------
>
> Key: SPARK-15352
> URL: https://issues.apache.org/jira/browse/SPARK-15352
> Project: Spark
> Issue Type: New Feature
> Components: Block Manager, Mesos, Spark Core, YARN
> Reporter: Shubham Chopra
> Assignee: Apache Spark
>
> With cached RDDs, Spark can be used for online analytics where it is used to
> respond to online queries. But loss of RDD partitions due to node/executor
> failures can cause huge delays in such use cases as the data would have to be
> regenerated.
> Cached RDDs, even when using multiple replicas per block, are not currently
> resilient to node failures when multiple executors are started on the same
> node. Block replication currently chooses a peer at random, and this peer
> could also exist on the same host.
> This effort would add topology aware replication to Spark that can be enabled
> with pluggable strategies. For ease of development/review, this is being
> broken down to three major work-efforts:
> 1. Making peer selection for replication pluggable
> 2. Providing pluggable implementations for providing topology and topology
> aware replication
> 3. Pro-active replenishment of lost blocks
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]