This is an automated email from the ASF dual-hosted git repository.

zuston pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git


The following commit(s) were added to refs/heads/master by this push:
     new e0434791a [#1459] fix(server): Memory leak for exceptional scenarios 
when flushing events (#1537)
e0434791a is described below

commit e0434791a9ff206327bbdaab0d5081ffe28da429
Author: RickyMa <[email protected]>
AuthorDate: Sat Feb 24 16:30:31 2024 +0800

    [#1459] fix(server): Memory leak for exceptional scenarios when flushing 
events (#1537)
    
    ### What changes were proposed in this pull request?
    
    Memory may leak in exceptional scenarios when flushing events
    
    ### Why are the changes needed?
    
    A follow-up PR for: https://github.com/apache/incubator-uniffle/pull/1459
    
    I found out that memory may leak in exceptional scenarios when flushing 
events.
    <img width="560" alt="企业微信截图_17086201359387" 
src="https://github.com/apache/incubator-uniffle/assets/13834479/e7b77e82-4336-4123-b59a-621346edd613";>
    <img width="811" alt="企业微信截图_1708620172540" 
src="https://github.com/apache/incubator-uniffle/assets/13834479/931eeb38-0c1b-4cfb-83be-485b74c10e17";>
    Because the following code snippets have not been executed:
    ```
    event.addCleanupCallback(
      () -> {
        this.clearInFlushBuffer(event.getEventId());
        spBlocks.forEach(spb -> spb.getData().release());
      }
    );
    ```
    and
    ```
    event.addCleanupCallback(() -> releaseMemory(event.getSize(), true, false));
    ```
    
    
    It can also lead to heap memory not being released, as 
`HybridStorageManager`.`eventOfUnderStorageManagers` will hold a large amount 
of unreleased heap memory. This is because the code 
`event.addCleanupCallback(() -> eventOfUnderStorageManagers.invalidate(event))` 
has not been executed.
    <img width="986" alt="企业微信截图_17086622982256" 
src="https://github.com/apache/incubator-uniffle/assets/13834479/98721112-8805-4f40-83ef-8daa463b6547";>
    <img width="454" alt="企业微信截图_17086635217020" 
src="https://github.com/apache/incubator-uniffle/assets/13834479/e54b9e77-33de-44dd-b727-55c41bdfc95f";>
    
    
    
    After this PR, the memory will not leak and the exception's stack trace 
will be something like:
    [2024-02-23 14:25:02.614] [LocalFileFlushEventThreadPool-54] [ERROR] 
DefaultFlushEventHandler.handleEventAndUpdateMetrics - Unexpected exceptions 
happened due to
    java.lang.NullPointerException
            at 
org.apache.uniffle.server.ShuffleTaskInfo.getMaxConcurrencyPerPartitionToWrite(ShuffleTaskInfo.java:109)
            at 
org.apache.uniffle.server.ShuffleFlushManager.getMaxConcurrencyPerPartitionWrite(ShuffleFlushManager.java:198)
            at 
org.apache.uniffle.server.ShuffleFlushManager.processFlushEvent(ShuffleFlushManager.java:149)
            at 
org.apache.uniffle.server.DefaultFlushEventHandler.handleEventAndUpdateMetrics(DefaultFlushEventHandler.java:87)
            at 
org.apache.uniffle.server.DefaultFlushEventHandler.lambda$dispatchEvent$0(DefaultFlushEventHandler.java:203)
            at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:750)
    
    We can talk about the above exception in another issue after this PR is 
merged. This PR is focused on fixing potential memory leaks.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Existing UTs.
    
    ---------
    
    Co-authored-by: Enrico Minack <[email protected]>
---
 .../java/org/apache/uniffle/server/DefaultFlushEventHandler.java   | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git 
a/server/src/main/java/org/apache/uniffle/server/DefaultFlushEventHandler.java 
b/server/src/main/java/org/apache/uniffle/server/DefaultFlushEventHandler.java
index 1d4f43ca5..c5b320091 100644
--- 
a/server/src/main/java/org/apache/uniffle/server/DefaultFlushEventHandler.java
+++ 
b/server/src/main/java/org/apache/uniffle/server/DefaultFlushEventHandler.java
@@ -124,8 +124,15 @@ public class DefaultFlushEventHandler implements 
FlushEventHandler {
       }
 
       if (e instanceof EventInvalidException) {
+        // Invalid events have already been released / cleaned up
+        // so no need to call event.doCleanup() here
         return;
       }
+
+      LOG.error(
+          "Unexpected exceptions happened when handling the flush event: {}, 
due to ", event, e);
+      // We need to release the memory when unexpected exceptions happened
+      event.doCleanup();
     } finally {
       if (storage != null) {
         if (storage instanceof HadoopStorage) {

Reply via email to