1996fanrui commented on code in PR #19993:
URL: https://github.com/apache/flink/pull/19993#discussion_r900803080
##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequest.java:
##########
@@ -109,6 +112,9 @@ static ChannelStateWriteRequest buildFutureWriteRequest(
}
},
throwable -> {
+ if (!dataFuture.isDone()) {
+ return;
+ }
Review Comment:
Hi @zentol @pnowojski , I replied in
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792). And I think
the root cause of FLINK-27792 is we don't have a reasonable cleanup dataFuture
in discardAction, so we should improve the discardAction.
And I think the
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792) is the same
root cause as [FLINK-28077](https://issues.apache.org/jira/browse/FLINK-28077).
Defective discardAction resulted in FLINK-27792 and FLINK-28077. Once this PR
is done, they will be resolved.
I know this change still has some bugs, it may cause network memory leak. I
will think more and think about how to solve it.
Do you think we can handle FLINK-27792 and FLINK-28077 by this PR? If
anything is wrong, please correct me.
##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequest.java:
##########
@@ -98,6 +98,9 @@ static ChannelStateWriteRequest buildFutureWriteRequest(
List<Buffer> buffers;
try {
buffers = dataFuture.get();
+ } catch (InterruptedException e) {
+ writer.fail(e);
+ throw e;
Review Comment:
Hi @zentol @pnowojski , I replied in
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792). And I think
the root cause of FLINK-27792 is we don't have a reasonable cleanup dataFuture
in discardAction, so we should improve the discardAction.
I improved action in this change because it is also a bug, we should
continue throw InterruptedException here.
And I think this change and
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792) are different.
And I think the
[FLINK-27792](https://issues.apache.org/jira/browse/FLINK-27792) is the same
root cause as [FLINK-28077](https://issues.apache.org/jira/browse/FLINK-28077),
that is next change in this PR. Defective discardAction resulted in FLINK-27792
and FLINK-28077. Once this PR is done, they will be resolved.
I know next change still has some bugs, it may cause network memory leak. I
will think more and think about how to solve it. What do you think? If anything
is wrong, please correct me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]