[
https://issues.apache.org/jira/browse/SPARK-15606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shixiong Zhu updated SPARK-15606:
---------------------------------
Affects Version/s: 1.6.2
> Driver hang in o.a.s.DistributedSuite on 2 core machine
> -------------------------------------------------------
>
> Key: SPARK-15606
> URL: https://issues.apache.org/jira/browse/SPARK-15606
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.6.2, 2.0.0
> Environment: AMD64 box with only 2 cores
> Reporter: Pete Robbins
> Assignee: Pete Robbins
> Fix For: 1.6.2, 2.0.0
>
>
> repeatedly failing task that crashes JVM *** FAILED ***
> The code passed to failAfter did not complete within 100000 milliseconds.
> (DistributedSuite.scala:128)
> This test started failing and DistrbutedSuite hanging following
> https://github.com/apache/spark/pull/13055
> It looks like the extra message to remove the BlockManager deadlocks as there
> are only 2 message processing loop threads. Related to
> https://issues.apache.org/jira/browse/SPARK-13906
> {code}
> /** Thread pool used for dispatching messages. */
> private val threadpool: ThreadPoolExecutor = {
> val numThreads =
> nettyEnv.conf.getInt("spark.rpc.netty.dispatcher.numThreads",
> math.max(2, Runtime.getRuntime.availableProcessors()))
> val pool = ThreadUtils.newDaemonFixedThreadPool(numThreads,
> "dispatcher-event-loop")
> for (i <- 0 until numThreads) {
> pool.execute(new MessageLoop)
> }
> pool
> }
> {code}
> Setting a minimum of 3 threads alleviates this issue but I'm not sure there
> isn't another underlying problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]