Mridul Muralidharan created SPARK-23721: -------------------------------------------
Summary: Use actual node's hostname for host and rack locality computation Key: SPARK-23721 URL: https://issues.apache.org/jira/browse/SPARK-23721 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.0 Reporter: Mridul Muralidharan In spark, host and rack locality computation is based on BlockManagerId's hostname - which is the container's hostname. When running in containerized environment's like kubernetes, docker support in hadoop 3, mesos docker support, etc; the hostname reported by container is not the actual 'host' the container is running on. This results in spark getting affected in multiple ways. h3. Suboptimal schedules Due to host name mismatch between different containers on same physical host, spark will treat all containers as running on own host. Effectively, there is no host-locality schedule at all due to this. In addition, depending on how sophisticated locality script is, it can also lead to either suboptimal rack locality computation all the way to no rack-locality schedule entirely. Hence the performance degradation in scheduler can be significant - only PROCESS_LOCAL schedules dont get affected. h3. HDFS reads This is closely related to "suboptimal schedules" above. Block locations for hdfs files refer to the datanode hostnames - and not the container's hostname. This effectively results in spark ignoring hdfs data placement entirely for scheduling tasks - resulting in very heavy cross-node/cross-rack data movement. h3. Speculative execution Spark schedules speculative tasks on a different host - in order to minimize the cost of node failures for expensive tasks. This gets effectively disabled, resulting in speculative tasks potentially running on the same actual host. h3. Block replication Similar to "speculative execution" above, block replication minimizes potential cost of node loss by typically leveraging another host; which gets effectively disabled in this case. Solution for the above is to enhance BlockManagerId to also include the node's actual hostname via 'nodeHostname' - which should be used for usecases above instead of the container hostname ('host'). When not relevant, nodeHostname == hostname : which should ensure all existing functionality continues to work as expected with regressions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org