L. C. Hsieh created SPARK-29469:
-----------------------------------

             Summary: Avoid retries by RetryingBlockFetcher when 
ExternalBlockStoreClient is closed
                 Key: SPARK-29469
                 URL: https://issues.apache.org/jira/browse/SPARK-29469
             Project: Spark
          Issue Type: Improvement
          Components: Shuffle
    Affects Versions: 3.0.0
            Reporter: L. C. Hsieh


Found that some NPE was thrown in job log:

2019-10-14 20:06:16 ERROR RetryingBlockFetcher:143 - Exception while beginning 
fetch of 2 outstanding blocks (after 3 retries)
java.lang.NullPointerException
        at 
org.apache.spark.network.shuffle.ExternalShuffleClient.lambda$fetchBlocks$0(ExternalShuffleClient.java:100)
        at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
        at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.lambda$initiateRetry$0(RetryingBlockFetcher.java:169)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)


It was happened after BlockManager and ExternalBlockStoreClient was closed due 
to previous errors. In this cases, RetryingBlockFetcher does not need to retry. 
This NPE is harmless for job execution, but is a source of misleading when 
looking at log. Especially for end-users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to