[
https://issues.apache.org/jira/browse/SPARK-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent geomakowski updated SPARK-19259:
----------------------------------------
Affects Version/s: (was: 2.0.2)
(was: 2.0.1)
(was: 1.6.3)
(was: 2.1.0)
(was: 1.6.2)
(was: 2.0.0)
> spark locality in CNI context
> -----------------------------
>
> Key: SPARK-19259
> URL: https://issues.apache.org/jira/browse/SPARK-19259
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler
> Environment: Mesos and all resources managers using CNI model
> (Kubernetes, GKE, ECS...)
> Reporter: Vincent geomakowski
> Labels: performance, security
>
> When using CNI deployment model, each executor gets its own IP/hostname so
> Spark isn't able to schedule tasks locally depending on the hostnames
> advertised by the backend. Currently all backends providing data locality
> with Spark use the same method: advertise the topology by giving list of
> hostnames.
> On one hand, data locality is mandatory for large scale production jobs as
> you can get huge performance improvement. On the other hand, CNI is clearly
> the future network model of all container deployments providing easy service
> discovery, isolation and security policies. So it would be very interesting
> to mix these two features in Spark.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]