[ 
https://issues.apache.org/jira/browse/FLINK-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451451#comment-15451451
 ] 

ASF GitHub Bot commented on FLINK-4478:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2435#discussion_r76937451
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/heartbeat/HeartbeatListener.java
 ---
    @@ -0,0 +1,61 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.runtime.heartbeat;
    +
    +import org.apache.flink.runtime.clusterframework.types.ResourceID;
    +import scala.concurrent.Future;
    +
    +/**
    + * Interface for the interaction with the {@link HeartbeatManager}. The 
heartbeat listener is used
    + * for the following things:
    + * <p>
    + * <ul>
    + *     <il>Notifications about heartbeat timeouts</il>
    + *     <li>Payload reports of incoming heartbeats</li>
    + *     <li>Retrieval of payloads for outgoing heartbeats</li>
    + * </ul>
    + * @param <I> Type of the incoming payload
    + * @param <O> Type of the outgoing payload
    + */
    +public interface HeartbeatListener<I, O> {
    +
    +   /**
    +    * Callback which is called if a heartbeat for the machine identified 
by the given resource
    +    * ID times out.
    +    *
    +    * @param resourceID Resource ID of the machine whose heartbeat has 
timed out
    +    */
    +   void notifyHeartbeatTimeout(ResourceID resourceID);
    +
    +   /**
    +    * Callback which is called whenever a heartbeat with an associated 
payload is received. The
    +    * carried payload is given to this method.
    +    *
    +    * @param payload Payload of the received heartbeat
    +    */
    +   void reportPayload(I payload);
    --- End diff --
    
    Yes, good point. Will add it.


> Implement heartbeat logic
> -------------------------
>
>                 Key: FLINK-4478
>                 URL: https://issues.apache.org/jira/browse/FLINK-4478
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.1.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>             Fix For: 1.2.0
>
>
> With the Flip-6 refactoring, we'll have the need for a dedicated heartbeat 
> component. The heartbeat component is used to check the liveliness of the 
> distributed components among each other. Furthermore, heartbeats are used to 
> regularly transmit status updates to another component. For example, the 
> TaskManager informs the ResourceManager with each heartbeat about the current 
> slot allocation.
> The heartbeat is initiated from one component. This component sends a 
> heartbeat request to another component which answers with an heartbeat 
> response. Thus, one can differentiate between a sending and a receiving side. 
> Apart from the triggering of the heartbeat request, the logic of treating 
> heartbeats, marking components dead and payload delivery are the same and 
> should be reusable by different distributed components (JM, TM, RM).
> Different models for the heartbeat reporting are conceivable. First of all, 
> the heartbeat request could be sent as an ask operation where the heartbeat 
> response is returned as a future on the sending side. Alternatively, the 
> sending side could request a heartbeat response by sending a tell message. 
> The heartbeat response is then delivered by an RPC back to the heartbeat 
> sender. The latter model has the advantage that a heartbeat response is not 
> tightly coupled to a heartbeat request. Such a tight coupling could cause 
> that heartbeat response are ignored after the future has timed out even 
> though they might still contain valuable information (receiver is still 
> alive).
> Furthermore, different strategies for the heartbeat triggering and marking 
> heartbeat targets as dead are conceivable. For example, we could periodically 
> (with a fixed period) trigger a heartbeat request and mark all targets as 
> dead if we didn't receive a heartbeat response in a given time period. 
> Furthermore, we could adapt the heartbeat interval and heartbeat timeouts 
> with respect to the latency of previous heartbeat responses. This would 
> reflect the current load and network conditions better.
> For the first version, I would propose to use a fixed period heartbeat with a 
> maximum heartbeat timeout before a target is marked dead. Furthermore, I 
> would propose to use tell messages (fire and forget) to request and report 
> heartbeats because they are the more flexible model imho.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to