Mridul Muralidharan created SPARK-23721:
-------------------------------------------
Summary: Use actual node's hostname for host and rack locality
computation
Key: SPARK-23721
URL: https://issues.apache.org/jira/browse/SPARK-23721
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.4.0
Reporter: Mridul Muralidharan
In spark, host and rack locality computation is based on BlockManagerId's
hostname - which is the container's hostname.
When running in containerized environment's like kubernetes, docker support in
hadoop 3, mesos docker support, etc; the hostname reported by container is not
the actual 'host' the container is running on.
This results in spark getting affected in multiple ways.
h3. Suboptimal schedules
Due to host name mismatch between different containers on same physical host,
spark will treat all containers as running on own host.
Effectively, there is no host-locality schedule at all due to this.
In addition, depending on how sophisticated locality script is, it can also
lead to either suboptimal rack locality computation all the way to no
rack-locality schedule entirely.
Hence the performance degradation in scheduler can be significant - only
PROCESS_LOCAL schedules dont get affected.
h3. HDFS reads
This is closely related to "suboptimal schedules" above.
Block locations for hdfs files refer to the datanode hostnames - and not the
container's hostname.
This effectively results in spark ignoring hdfs data placement entirely for
scheduling tasks - resulting in very heavy cross-node/cross-rack data movement.
h3. Speculative execution
Spark schedules speculative tasks on a different host - in order to minimize
the cost of node failures for expensive tasks.
This gets effectively disabled, resulting in speculative tasks potentially
running on the same actual host.
h3. Block replication
Similar to "speculative execution" above, block replication minimizes potential
cost of node loss by typically leveraging another host; which gets effectively
disabled in this case.
Solution for the above is to enhance BlockManagerId to also include the node's
actual hostname via 'nodeHostname' - which should be used for usecases above
instead of the container hostname ('host').
When not relevant, nodeHostname == hostname : which should ensure all existing
functionality continues to work as expected with regressions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]