[
https://issues.apache.org/jira/browse/SPARK-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell resolved SPARK-3139.
------------------------------------
Resolution: Fixed
Fix Version/s: 1.1.0
Issue resolved by pull request 2143
[https://github.com/apache/spark/pull/2143]
> Akka timeouts from ContextCleaner when cleaning shuffles
> --------------------------------------------------------
>
> Key: SPARK-3139
> URL: https://issues.apache.org/jira/browse/SPARK-3139
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.1.0
> Environment: 10 r3.2xlarge tests on EC2, running the
> scala-agg-by-key-int spark-perf test against master commit
> d7e80c2597d4a9cae2e0cb35a86f7889323f4cbb.
> Reporter: Josh Rosen
> Assignee: Guoqiang Li
> Priority: Blocker
> Fix For: 1.1.0
>
>
> When running spark-perf tests on EC2, I have a job that's consistently
> logging the following Akka exceptions:
> {code}
> 4/08/19 22:07:12 ERROR spark.ContextCleaner: Error cleaning shuffle 0
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
> at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:107)
> at
> org.apache.spark.storage.BlockManagerMaster.removeShuffle(BlockManagerMaster.scala:118)
> at
> org.apache.spark.ContextCleaner.doCleanupShuffle(ContextCleaner.scala:159)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:131)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:124)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:124)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:120)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:120)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1252)
> at
> org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:119)
> at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
> {code}
> and
> {code}
> 14/08/19 22:07:12 ERROR storage.BlockManagerMaster: Failed to remove shuffle 0
> akka.pattern.AskTimeoutException: Timed out
> at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)
> at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118)
> at
> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
> at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
> at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455)
> at
> akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407)
> at
> akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411)
> at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> This doesn't seem to prevent the job from completing successfully, but it's
> serious issue because it means that resources aren't being cleaned up. The
> test script, ScalaAggByKeyInt, runs each test 10 times, and I see the same
> error after each test, so this seems deterministically reproducible.
> I'll look at the executor logs to see if I can find more info there.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]