[
https://issues.apache.org/jira/browse/SPARK-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mridul Muralidharan updated SPARK-23721:
----------------------------------------
Summary: Enhance BlockManagerId to include container's underlying host name
(was: Use actual node's hostname for host and rack locality computation)
> Enhance BlockManagerId to include container's underlying host name
> ------------------------------------------------------------------
>
> Key: SPARK-23721
> URL: https://issues.apache.org/jira/browse/SPARK-23721
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.4.0
> Reporter: Mridul Muralidharan
> Priority: Major
>
> In spark, host and rack locality computation is based on BlockManagerId's
> hostname - which is the container's hostname.
> When running in containerized environment's like kubernetes, docker support
> in hadoop 3, mesos docker support, etc; the hostname reported by container is
> not the actual 'host' the container is running on.
> This results in spark getting affected in multiple ways.
> h3. Suboptimal schedules
> Due to host name mismatch between different containers on same physical host,
> spark will treat all containers as running on own host.
> Effectively, there is no host-locality schedule at all due to this.
> In addition, depending on how sophisticated locality script is, it can also
> lead to either suboptimal rack locality computation all the way to no
> rack-locality schedule entirely.
> Hence the performance degradation in scheduler can be significant - only
> PROCESS_LOCAL schedules dont get affected.
> h3. HDFS reads
> This is closely related to "suboptimal schedules" above.
> Block locations for hdfs files refer to the datanode hostnames - and not the
> container's hostname.
> This effectively results in spark ignoring hdfs data placement entirely for
> scheduling tasks - resulting in very heavy cross-node/cross-rack data
> movement.
> h3. Speculative execution
> Spark schedules speculative tasks on a different host - in order to minimize
> the cost of node failures for expensive tasks.
> This gets effectively disabled, resulting in speculative tasks potentially
> running on the same actual host.
> h3. Block replication
> Similar to "speculative execution" above, block replication minimizes
> potential cost of node loss by typically leveraging another host; which gets
> effectively disabled in this case.
> Solution for the above is to enhance BlockManagerId to also include the
> node's actual hostname via 'nodeHostname' - which should be used for usecases
> above instead of the container hostname ('host').
> When not relevant, nodeHostname == hostname : which should ensure all
> existing functionality continues to work as expected with regressions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]