lindong28 commented on a change in pull request #15161:
URL: https://github.com/apache/flink/pull/15161#discussion_r600993395
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/operators/coordination/RecreateOnResetOperatorCoordinator.java
##########
@@ -74,7 +74,8 @@ public void start() throws Exception {
@Override
public void close() throws Exception {
closed = true;
- coordinator.closeAsync(closingTimeoutMs);
+ // Wait for coordinator close before destructing any user class loader.
+ coordinator.closeAsync(closingTimeoutMs).get();
Review comment:
@becketqin Thanks for the explanation!
It looks like we don't have clear idea on the root cause of
`ClassNotFoundException`. And there is potential risk (probably unlikely) with
making the call synchronous when user code has blocking operation.
Regarding the 2nd point above, it looks like there could be tradeoff between
resource leak and blocking JobManager's scheduler thread. I suppose it is
better to understand the resource leak before making JobManager's thread
blocking.
And from https://issues.apache.org/jira/browse/FLINK-20114, it looks like
the ClassNotFoundException is emitted in the WARNING without disrupting user's
job. It seems OK to still emit this ClassNotFoundException in the log until we
resolve the concerns described above.
@StephanEwen @becketqin To place on the safe side, I have removed this
change from this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]