I was testing out a new project at scale on Spark 2.0.2 running on YARN,
and my job failed with an interesting error message:

TaskSetManager: Lost task 37.3 in stage 31.0 (TID 10684,
server.host.name): java.lang.IllegalStateException: There is no space
for new record
05:27:09.573     at
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.insertRecord(UnsafeInMemorySorter.java:211)
05:27:09.574     at
org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:127)
05:27:09.574     at
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:244)
05:27:09.575     at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
Source)
05:27:09.575     at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
05:27:09.576     at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
05:27:09.576     at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
05:27:09.577     at
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
05:27:09.577     at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
05:27:09.577     at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
05:27:09.578     at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
05:27:09.578     at org.apache.spark.scheduler.Task.run(Task.scala:86)
05:27:09.578     at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
05:27:09.579     at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
05:27:09.579     at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
05:27:09.579     at java.lang.Thread.run(Thread.java:745)

I’ve never seen this before, and searching on Google/DDG/JIRA doesn’t yield
any results. There are no other errors coming from that executor, whether
related to memory, storage space, or otherwise.

Could this be a bug? If so, how would I narrow down the source? Otherwise,
how might I work around the issue?

Nick
​

Reply via email to