Josh Rosen created SPARK-28340: ---------------------------------- Summary: Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException" Key: SPARK-28340 URL: https://issues.apache.org/jira/browse/SPARK-28340 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Josh Rosen
If a Spark task is killed while writing blocks to disk (due to intentional job kills, automated killing of redundant speculative tasks, etc) then Spark may log exceptions like {code:java} 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /<FILENAME> java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372) at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369) at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled task can result in hundreds of these stacktraces being logged. Here are some StackOverflow questions asking about this: * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash] * [https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple] * [https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark] * [https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills] Can we prevent this exception from occurring? If not, can we treat this "expected exception" in a special manner to avoid log spam? My concern is that the presence of large numbers of spurious exceptions is confusing to users when they are inspecting Spark logs to diagnose other issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org