[ 
https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918306#comment-16918306
 ] 

Saisai Shao commented on SPARK-28340:
-------------------------------------

We also saw a bunch of exceptions in our production environment. Looks like it 
is hard to prevent unless we change to not use `interrupt`, maybe we can just 
ignore logging such exceptions.

> Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught 
> exception while reverting partial writes to file: 
> java.nio.channels.ClosedByInterruptException"
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28340
>                 URL: https://issues.apache.org/jira/browse/SPARK-28340
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Josh Rosen
>            Priority: Minor
>
> If a Spark task is killed while writing blocks to disk (due to intentional 
> job kills, automated killing of redundant speculative tasks, etc) then Spark 
> may log exceptions like
> {code:java}
> 19/07/10 21:31:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception 
> while reverting partial writes to file /<FILENAME>
> java.nio.channels.ClosedByInterruptException
>       at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>       at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:372)
>       at 
> org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$revertPartialWritesAndClose$2.apply$mcV$sp(DiskBlockObjectWriter.scala:218)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1369)
>       at 
> org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:214)
>       at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:237)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:105)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>       at org.apache.spark.scheduler.Task.run(Task.scala:121)
>       at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748){code}
> If {{BypassMergeSortShuffleWriter}} is being used then a single cancelled 
> task can result in hundreds of these stacktraces being logged.
> Here are some StackOverflow questions asking about this:
>  * [https://stackoverflow.com/questions/40027870/spark-jobserver-job-crash]
>  * 
> [https://stackoverflow.com/questions/50646953/why-is-java-nio-channels-closedbyinterruptexceptio-called-when-caling-multiple]
>  * 
> [https://stackoverflow.com/questions/41867053/java-nio-channels-closedbyinterruptexception-in-spark]
>  * 
> [https://stackoverflow.com/questions/56845041/are-closedbyinterruptexception-exceptions-expected-when-spark-speculation-kills]
>  
> Can we prevent this exception from occurring? If not, can we treat this 
> "expected exception" in a special manner to avoid log spam? My concern is 
> that the presence of large numbers of spurious exceptions is confusing to 
> users when they are inspecting Spark logs to diagnose other issues.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to