[spark] branch branch-3.0 updated: [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test

gurwls223 Fri, 29 Jan 2021 06:56:07 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new cc78282  [SPARK-34154][YARN][FOLLOWUP] Fix flaky 
LocalityPlacementStrategySuite test
cc78282 is described below

commit cc782829b7e054a8750912d3a96cf034a7ba081a
Author: “attilapiros” <[email protected]>
AuthorDate: Fri Jan 29 23:54:40 2021 +0900

    [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test
    
    ### What changes were proposed in this pull request?
    
    Fixing the flaky `handle large number of containers and tasks 
(SPARK-18750)` by avoiding to use `DNSToSwitchMapping` as in some situation DNS 
lookup could be extremely slow.
    
    ### Why are the changes needed?
    
    After https://github.com/apache/spark/pull/31363 was merged the flaky 
`handle large number of containers and tasks (SPARK-18750)` test failed again 
in some other PRs but now we have the exact place where the test is stuck.
    
    It is in the DNS lookup:
    
    ```
    [info] - handle large number of containers and tasks (SPARK-18750) *** 
FAILED *** (30 seconds, 4 milliseconds)
    [info]   Failed with an exception or a timeout at thread join:
    [info]
    [info]   java.lang.RuntimeException: Timeout at waiting for thread to stop 
(its stack trace is added to the exception)
    [info]      at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    [info]      at 
java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    [info]      at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    [info]      at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
    [info]      at java.net.InetAddress.getAllByName(InetAddress.java:1193)
    [info]      at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    [info]      at java.net.InetAddress.getByName(InetAddress.java:1077)
    [info]      at 
org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:568)
    [info]      at 
org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:585)
    [info]      at 
org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
    [info]      at 
org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:75)
    [info]      at 
org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.$anonfun$localityOfRequestedContainers$3(LocalityPreferredContainerPlacementStrategy.scala:142)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy$$Lambda$658/1080992036.apply$mcVI$sp(Unknown
 Source)
    [info]      at 
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.localityOfRequestedContainers(LocalityPreferredContainerPlacementStrategy.scala:138)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.org$apache$spark$deploy$yarn$LocalityPlacementStrategySuite$$runTest(LocalityPlacementStrategySuite.scala:94)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite$$anon$1.run(LocalityPlacementStrategySuite.scala:40)
    [info]      at java.lang.Thread.run(Thread.java:748) 
(LocalityPlacementStrategySuite.scala:61)
    ...
    ```
    
    This could be because of the DNS servers used by those build machines are 
not configured to handle IPv6 queries and the client has to wait for the IPv6 
query to timeout before falling back to IPv4.
    
    This even make the tests more consistent. As when a single host was given 
to lookup via `resolve(hostName: String)` it gave a different answer from 
calling `resolve(hostNames: Seq[String])` with a `Seq` containing that single 
host.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Unit tests.
    
    Closes #31397 from attilapiros/SPARK-34154-2nd.
    
    Authored-by: “attilapiros” <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
    (cherry picked from commit d3f049cbc274ee64bb9b56d6addba4f2cb8f1f0a)
    Signed-off-by: HyukjinKwon <[email protected]>
---
 .../test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala  | 4 ++++
 1 file changed, 4 insertions(+)

diff --git 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
index 6216d47..0c40c98 100644
--- 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
+++ 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
@@ -21,6 +21,7 @@ import java.util.Collections
 
 import scala.collection.JavaConverters._
 
+import org.apache.hadoop.net.{Node, NodeBase}
 import org.apache.hadoop.yarn.api.records._
 import org.apache.hadoop.yarn.client.api.AMRMClient
 import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
@@ -47,6 +48,9 @@ class MockResolver extends 
SparkRackResolver(SparkHadoopUtil.get.conf) {
     if (hostName == "host3") "/rack2" else "/rack1"
   }
 
+  override def resolve(hostNames: Seq[String]): Seq[Node] =
+    hostNames.map(n => new NodeBase(n, resolve(n)))
+
 }
 
 class YarnAllocatorSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterEach {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.0 updated: [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test

Reply via email to