Mridul Muralidharan created SPARK-23721:
-------------------------------------------

             Summary: Use actual node's hostname for host and rack locality 
computation
                 Key: SPARK-23721
                 URL: https://issues.apache.org/jira/browse/SPARK-23721
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.4.0
            Reporter: Mridul Muralidharan


In spark, host and rack locality computation is based on BlockManagerId's 
hostname - which is the container's hostname.
When running in containerized environment's like kubernetes, docker support in 
hadoop 3, mesos docker support, etc; the hostname reported by container is not 
the actual 'host' the container is running on.
This results in spark getting affected in multiple ways.

h3. Suboptimal schedules

Due to host name mismatch between different containers on same physical host, 
spark will treat all containers as running on own host.
Effectively, there is no host-locality schedule at all due to this.

In addition, depending on how sophisticated locality script is, it can also 
lead to either suboptimal rack locality computation all the way to no 
rack-locality schedule entirely.

Hence the performance degradation in scheduler can be significant - only 
PROCESS_LOCAL schedules dont get affected.

h3. HDFS reads

This is closely related to "suboptimal schedules" above.
Block locations for hdfs files refer to the datanode hostnames - and not the 
container's hostname.
This effectively results in spark ignoring hdfs data placement entirely for 
scheduling tasks - resulting in very heavy cross-node/cross-rack data movement.

h3. Speculative execution

Spark schedules speculative tasks on a different host - in order to minimize 
the cost of node failures for expensive tasks.
This gets effectively disabled, resulting in speculative tasks potentially 
running on the same actual host.

h3. Block replication

Similar to "speculative execution" above, block replication minimizes potential 
cost of node loss by typically leveraging another host; which gets effectively 
disabled in this case.




Solution for the above is to enhance BlockManagerId to also include the node's 
actual hostname via 'nodeHostname' - which should be used for usecases above 
instead of the container hostname ('host').
When not relevant, nodeHostname == hostname : which should ensure all existing 
functionality continues to work as expected with regressions.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to