[spark] branch branch-3.1 updated: [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test

gurwls223 Fri, 29 Jan 2021 06:55:40 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new d4ca766  [SPARK-34154][YARN][FOLLOWUP] Fix flaky 
LocalityPlacementStrategySuite test
d4ca766 is described below

commit d4ca76627ea3c72d240b20cd771f2a55cf318ce4
Author: “attilapiros” <[email protected]>
AuthorDate: Fri Jan 29 23:54:40 2021 +0900

    [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test
    
    ### What changes were proposed in this pull request?
    
    Fixing the flaky `handle large number of containers and tasks 
(SPARK-18750)` by avoiding to use `DNSToSwitchMapping` as in some situation DNS 
lookup could be extremely slow.
    
    ### Why are the changes needed?
    
    After https://github.com/apache/spark/pull/31363 was merged the flaky 
`handle large number of containers and tasks (SPARK-18750)` test failed again 
in some other PRs but now we have the exact place where the test is stuck.
    
    It is in the DNS lookup:
    
    ```
    [info] - handle large number of containers and tasks (SPARK-18750) *** 
FAILED *** (30 seconds, 4 milliseconds)
    [info]   Failed with an exception or a timeout at thread join:
    [info]
    [info]   java.lang.RuntimeException: Timeout at waiting for thread to stop 
(its stack trace is added to the exception)
    [info]      at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    [info]      at 
java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    [info]      at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    [info]      at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
    [info]      at java.net.InetAddress.getAllByName(InetAddress.java:1193)
    [info]      at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    [info]      at java.net.InetAddress.getByName(InetAddress.java:1077)
    [info]      at 
org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:568)
    [info]      at 
org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:585)
    [info]      at 
org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
    [info]      at 
org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:75)
    [info]      at 
org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.$anonfun$localityOfRequestedContainers$3(LocalityPreferredContainerPlacementStrategy.scala:142)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy$$Lambda$658/1080992036.apply$mcVI$sp(Unknown
 Source)
    [info]      at 
scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.localityOfRequestedContainers(LocalityPreferredContainerPlacementStrategy.scala:138)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.org$apache$spark$deploy$yarn$LocalityPlacementStrategySuite$$runTest(LocalityPlacementStrategySuite.scala:94)
    [info]      at 
org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite$$anon$1.run(LocalityPlacementStrategySuite.scala:40)
    [info]      at java.lang.Thread.run(Thread.java:748) 
(LocalityPlacementStrategySuite.scala:61)
    ...
    ```
    
    This could be because of the DNS servers used by those build machines are 
not configured to handle IPv6 queries and the client has to wait for the IPv6 
query to timeout before falling back to IPv4.
    
    This even make the tests more consistent. As when a single host was given 
to lookup via `resolve(hostName: String)` it gave a different answer from 
calling `resolve(hostNames: Seq[String])` with a `Seq` containing that single 
host.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Unit tests.
    
    Closes #31397 from attilapiros/SPARK-34154-2nd.
    
    Authored-by: “attilapiros” <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
    (cherry picked from commit d3f049cbc274ee64bb9b56d6addba4f2cb8f1f0a)
    Signed-off-by: HyukjinKwon <[email protected]>
---
 .../test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala  | 4 ++++
 1 file changed, 4 insertions(+)

diff --git 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
index 825bdd9..9a7eed6 100644
--- 
a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
+++ 
b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
@@ -22,6 +22,7 @@ import java.util.Collections
 import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import org.apache.hadoop.net.{Node, NodeBase}
 import org.apache.hadoop.yarn.api.records._
 import org.apache.hadoop.yarn.client.api.AMRMClient
 import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
@@ -50,6 +51,9 @@ class MockResolver extends 
SparkRackResolver(SparkHadoopUtil.get.conf) {
     if (hostName == "host3") "/rack2" else "/rack1"
   }
 
+  override def resolve(hostNames: Seq[String]): Seq[Node] =
+    hostNames.map(n => new NodeBase(n, resolve(n)))
+
 }
 
 class YarnAllocatorSuite extends SparkFunSuite with Matchers with 
BeforeAndAfterEach {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.1 updated: [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test

Reply via email to