Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/3360
  
    Thanks for the clarification @zhijiangW. I know understand the problem that 
we effectively introduce via `RpcEndpoint.runAsync` another message which might 
get "lost" (e.g. due to OOM exception).
    
    I agree with Stephan that it's hard to reason about the consistency of the 
`AkkaRpcActor
    s` internal state once we see an exception. The conservative approach would 
probably be to let it terminate or calling `notifyFatalError` to handle it.
    
    Related to this is also how we handle exceptions in the 
`AkkaRpcActor.handleRpcInvocation`. There we catch all exception and simply 
send them to the caller. I think in this method we should only send the 
non-fatal exceptions back and terminate otherwise.
    
    To follow a similar pattern for the `handleRunAsync` we could think about 
returning a `Future` which we return when calling `RpcEndpoint.runAsync` which 
will be completed with non fatal exceptions or if the `Runnable` has been 
executed. And in case that we see a fatal exception we terminate or call 
`notifyFatalError`. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to