mateczagany commented on code in PR #27407:
URL: https://github.com/apache/flink/pull/27407#discussion_r2693277077
##########
flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/AbstractRestHandler.java:
##########
@@ -85,14 +85,15 @@ protected CompletableFuture<Void> respondToRequest(
response = FutureUtils.completedExceptionally(e);
}
- return response.thenAccept(
+ return response.thenAcceptAsync(
resp ->
HandlerUtils.sendResponse(
ctx,
httpRequest,
resp,
messageHeaders.getResponseStatusCode(),
- responseHeaders));
+ responseHeaders),
+ ctx.executor());
Review Comment:
@snuyanzin the issue was this stuck thread:
```
Jan 14 21:15:14 at
org.apache.flink.core.testutils.BlockerSync.awaitBlocker(BlockerSync.java:59)
Jan 14 21:15:14 - locked <0x00000000aa141c48> (a java.lang.Object)
Jan 14 21:15:14 at
org.apache.flink.runtime.rest.RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing(RestServerEndpointITCase.java:597)
```
Something has changed internally in Netty, and this test consistently times
out on Azure CI. The issue arises when the response is not written from the
event loop, and Netty enqueues the write operations on the event loop instead,
and then we close the REST server itself (which is the point of the test), the
writes don't always go through properly.
By running the write operations on the event loop, this issue goes away. I
think it's better to do it this way anyways, so that Netty does not have to
create 3 write tasks to eventually execute on the event loop thread.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]