[
https://issues.apache.org/jira/browse/SPARK-16017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336906#comment-15336906
]
Marcelo Vanzin commented on SPARK-16017:
----------------------------------------
That seems it might create a similar situation, but it was filed against a
version of Spark that didn't even have the code that caused this particular
regression. So my hunch is same symptom, different root cause, which may or may
not have been fixed in newer Spark versions.
> YarnClientSchedulerBackend now registers backends as IPs instead of Hostnames
> which causes all tasks to run with RACK_LOCAL locality.
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-16017
> URL: https://issues.apache.org/jira/browse/SPARK-16017
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.6.2, 2.0.0
> Reporter: Trystan Leftwich
> Priority: Critical
>
> Since this change:
> [SPARK-15395|https://issues.apache.org/jira/browse/SPARK-15395]
> When registering new executor backends it registers them as IPs instead of
> hostnames. This causes a flow on effect that when the Task manager is trying
> to figure out what Locality tasks should run at, no tasks can be run At the
> NODE_LOCAL level.
> This specific call:
> https://github.com/apache/spark/blob/branch-2.0/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L886
> [pendingTasksForHost|https://github.com/apache/spark/blob/branch-2.0/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L886]
> are all hostnames pulled from the DFS locations while
> [hasExecutorsAliveOnHost|https://github.com/apache/spark/blob/9b234b55d1b5e4a7c80e482b3e297bfb8b583a56/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L560],
> uses
> [executorsByHost|https://github.com/apache/spark/blob/branch-2.0/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L98],
> which are all IP's because they are populated from the RpcAddress.
> As expected this causes significant performance problems, A simple count
> query will take 22 seconds, But if I revert the change from
> [SPARK-15395|https://issues.apache.org/jira/browse/SPARK-15395], tasks will
> run with NODE_LOCAL locality and the same count will take 3 seconds.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]