This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new a40940a0bc6d [SPARK-47446][CORE] Make `BlockManager` warn before 
`removeBlockInternal`
a40940a0bc6d is described below

commit a40940a0bc6de58b5c56b8ad918f338c6e70572f
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Mon Mar 18 12:39:44 2024 -0700

    [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`
    
    ### What changes were proposed in this pull request?
    
    This PR aims to make `BlockManager` warn before invoking 
`removeBlockInternal` by switching the log position. To be clear,
    1. For the case where `removeBlockInternal` succeeds, the log messages are 
identical before and after this PR.
    2. For the case where `removeBlockInternal` fails, the user will see one 
additional warning message like the following which was hidden from the users 
before this PR.
    ```
    logWarning(s"Putting block $blockId failed")
    ```
    
    ### Why are the changes needed?
    
    When `Put` operation fails, Apache Spark currently tries 
`removeBlockInternal` first before logging.
    
    
https://github.com/apache/spark/blob/ce93c9fd86715e2479552628398f6fc11e83b2af/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1554-L1567
    
    On top of that, if `removeBlockInternal` fails consecutively, Spark shows 
the warning like the following and fails the job.
    ```
    24/03/18 18:40:46 WARN BlockManager: Putting block broadcast_0 failed due 
to exception java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e.
    24/03/18 18:40:46 WARN BlockManager: Block broadcast_0 was not removed 
normally.
    24/03/18 18:40:46 INFO TaskSchedulerImpl: Cancelling stage 0
    24/03/18 18:40:46 INFO TaskSchedulerImpl: Killing all running tasks in 
stage 0: Stage cancelled
    24/03/18 18:40:46 INFO DAGScheduler: ResultStage 0 (reduce at 
SparkPi.scala:38) failed in 0.264 s due to Job aborted due to stage failure: 
Task serialization failed: java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e
    java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e
    ```
    
    It's misleading although they might share the same root cause. Since `Put` 
operation fails before the above failure, we had better switch WARN message to 
make it clear.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. This is a warning message change only.
    
    ### How was this patch tested?
    
    Manual review.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #45570 from dongjoon-hyun/SPARK-47446.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 core/src/main/scala/org/apache/spark/storage/BlockManager.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index 228ec5752e1b..89b3914e94af 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -1561,8 +1561,8 @@ private[spark] class BlockManager(
           blockInfoManager.unlock(blockId)
         }
       } else {
-        removeBlockInternal(blockId, tellMaster = false)
         logWarning(s"Putting block $blockId failed")
+        removeBlockInternal(blockId, tellMaster = false)
       }
       res
     } catch {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to