[ https://issues.apache.org/jira/browse/FLINK-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869086#comment-16869086 ]
Xiaogang Shi commented on FLINK-12887: -------------------------------------- Hi [~till.rohrmann], now we are using many unfenced asynchronous operations in Yarn RM to process notifications from Yarn. Otherwise, Yarn RM will miss some notifications when it has not granted the leadership. Another case is the timers to release stuck containers. When a Yarn RM restarts, it will recover containers from previous attempts. Some containers may be in stuck and we should kill them to release resources. We now use timers to monitor these recovered containers and will kill those containers whose task managers cannot register in time. The timers must be unfenced because the Yarn RM may not grant the leadership when it recovers the containers. > Schedule UnfencedMessage would lost envelope info > -------------------------------------------------- > > Key: FLINK-12887 > URL: https://issues.apache.org/jira/browse/FLINK-12887 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.9.0 > Reporter: TisonKun > Priority: Major > > We provide {{runAsync}}, {{callAsync}} and {{scheduleRunAsync}} for > {{MainThreadExecutable}}, while providing {{runAsyncWithoutFencing}} and > {{callAsyncWithoutFencing}} additionally for {{FencedMainThreadExecutable}}. > Let's think about a case when we want to schedule a unfenced runnable or any > other unfenced message(currently, we don't have such code path but it's > semantically valid.). > 1. {{FencedAkkaRpcActor}} received an unfenced runnable with delay > 2. It extracted the runnable from unfenced message and call > {{super.handleRpcMessage}}. > 3. {{AkkaRpcActor}} enveloped the message and schedule it by > {{AkkaRpcActor#L410}}. > However, {{FencedAkkaRpcActor#envelopeSelfMessage}} was called for envelope. > Thus the unfenced message now become a fenced message. > We can anyway implement {{scheduleRunAsyncWithoutFencing}} to schedule > unfenced message directly by {{actorsystem.scheduler.scheduleOnce(..., > dispatcher)}}, but with current codebase I notice that {{RunAsync}} has a > wried {{atTimeNanos}}(i.e., delay) property. Ideally how to schedule a > message is shown on what params ScheduleExecutorService called with, at least > we cannot extract an unfenced message and envelop it into a fence message and > then schedule it, which goes into wrong semantic. > cc [~till.rohrmann] -- This message was sent by Atlassian JIRA (v7.6.3#76005)