1996fanrui commented on code in PR #19993:
URL: https://github.com/apache/flink/pull/19993#discussion_r900803080


##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequest.java:
##########
@@ -109,6 +112,9 @@ static ChannelStateWriteRequest buildFutureWriteRequest(
                     }
                 },
                 throwable -> {
+                    if (!dataFuture.isDone()) {
+                        return;
+                    }

Review Comment:
   Hi @zentol @pnowojski , I replied in 
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792). And I think 
the root cause of FLINK-27792 is we don't have a reasonable cleanup dataFuture 
in discardAction, so we should improve the discardAction.
   
   And I think the 
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792) is the same 
root cause as [FLINK-28077](https://issues.apache.org/jira/browse/FLINK-28077). 
Defective discardAction resulted in FLINK-27792 and FLINK-28077. Once this PR 
is done, they will be resolved.
   
   I know this change still has some bugs, it may cause network memory leak. I 
will think more and think about how to solve it. 
   
   Do you think we can handle FLINK-27792 and FLINK-28077 by this PR? If 
anything is wrong, please correct me.



##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequest.java:
##########
@@ -98,6 +98,9 @@ static ChannelStateWriteRequest buildFutureWriteRequest(
                     List<Buffer> buffers;
                     try {
                         buffers = dataFuture.get();
+                    } catch (InterruptedException e) {
+                        writer.fail(e);
+                        throw e;

Review Comment:
   Hi @zentol @pnowojski , I replied in 
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792). And I think 
the root cause of FLINK-27792 is we don't have a reasonable cleanup dataFuture 
in discardAction, so we should improve the discardAction.
   
   I improved action in this change because it is also a bug, we should 
continue throw InterruptedException here. 
   
   
   And I think this change and 
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792) are different. 
And I think the 
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792) is the same 
root cause as [FLINK-28077](https://issues.apache.org/jira/browse/FLINK-28077), 
that is next change in this PR. Defective discardAction resulted in FLINK-27792 
and FLINK-28077. Once this PR is done, they will be resolved.
   
   I know next change still has some bugs, it may cause network memory leak. I 
will think more and think about how to solve it. What do you think? If anything 
is wrong, please correct me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to