Josh Rosen created SPARK-28340:
----------------------------------

             Summary: Noisy exceptions when tasks are killed: 
"DiskBlockObjectWriter: Uncaught exception while reverting partial writes to 
file: java.nio.channels.ClosedByInterruptException"
                 Key: SPARK-28340
                 URL: https://issues.apache.org/jira/browse/SPARK-28340
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.0.0
            Reporter: Josh Rosen


If a Spark task is killed while writing blocks to disk (due to intentional job 
kills, automated killing of redundant speculative tasks, etc) then Spark may 
log exceptions like
{code:java}
19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception while 
reverting partial writes to file /<FILENAME>
java.nio.channels.ClosedByInterruptException
        at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
        at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
        at 
org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369)
        at 
org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
        at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
        at org.apache.spark.scheduler.Task.run(Task.scala:121)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748){code}
If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled task 
can result in hundreds of these stacktraces being logged.

Here are some StackOverflow questions asking about this:
 * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash]
 * 
[https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple]
 * 
[https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark]
 * 
[https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills]
 

Can we prevent this exception from occurring? If not, can we treat this 
"expected exception" in a special manner to avoid log spam? My concern is that 
the presence of large numbers of spurious exceptions is confusing to users when 
they are inspecting Spark logs to diagnose other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to