[ https://issues.apache.org/jira/browse/SPARK-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144588#comment-14144588 ]
Sean Owen commented on SPARK-3656: ---------------------------------- Duplicate of https://issues.apache.org/jira/browse/SPARK-3032 This was discussed even today on the mailing list. > IllegalArgumentException when I using sort-based shuffle > -------------------------------------------------------- > > Key: SPARK-3656 > URL: https://issues.apache.org/jira/browse/SPARK-3656 > Project: Spark > Issue Type: Bug > Components: Shuffle > Affects Versions: 1.1.0 > Reporter: yangping wu > Original Estimate: 8h > Remaining Estimate: 8h > > The code work fine in hash-based shuffle. > {code} > sc.textFile("file:///export1/spark/zookeeper.out").flatMap(l => l.split(" > ")).map(w=>(w,1)).reduceByKey(_ + _).collect > {code} > But when I test the program using sort-based shuffle,the program encounters > an error: > {code} > scala> sc.textFile("file:///export1/spark/zookeeper.out").flatMap(l => > l.split(" ")).map(w=>(w,1)).reduceByKey(_ + _).collect > org.apache.spark.SparkException: Job aborted due to stage failure: Task 22 in > stage 1.0 failed 1 times, most recent failure: Lost task 22.0 in stage 1.0 > (TID 22, localhost): java.lang.IllegalArgumentException: Comparison method > violates its general contract! > > org.apache.spark.util.collection.Sorter$SortState.mergeHi(Sorter.java:876) > > org.apache.spark.util.collection.Sorter$SortState.mergeAt(Sorter.java:495) > > org.apache.spark.util.collection.Sorter$SortState.mergeForceCollapse(Sorter.java:436) > > org.apache.spark.util.collection.Sorter$SortState.access$300(Sorter.java:294) > org.apache.spark.util.collection.Sorter.sort(Sorter.java:137) > > org.apache.spark.util.collection.AppendOnlyMap.destructiveSortedIterator(AppendOnlyMap.scala:271) > > org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:323) > > org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:271) > > org.apache.spark.util.collection.ExternalSorter.maybeSpill(ExternalSorter.scala:249) > > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:212) > > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:67) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:54) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > java.lang.Thread.run(Thread.java:619) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org