[ 
https://issues.apache.org/jira/browse/FLINK-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502728#comment-14502728
 ] 

ASF GitHub Bot commented on FLINK-1908:
---------------------------------------

Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/609#issuecomment-94442619
  
    The TaskManager uses an exponential backoff strategy to resolve connection
    problems with the JobManager.
    
    On Mon, Apr 20, 2015 at 11:07 AM, Max <[email protected]> wrote:
    
    > Thanks for the pull request. Seems to work fine. I was wondering,
    > shouldn't the task managers repeatably try to build up a connection to the
    > job manager? For me, that seems to be a nicer way to solve this problem.
    > That way, the startup script doesn't need to be aware of the job manager's
    > rpc port.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/flink/pull/609#issuecomment-94401080>.
    >



> JobManager startup delay isn't considered when using start-cluster.sh script
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-1908
>                 URL: https://issues.apache.org/jira/browse/FLINK-1908
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>    Affects Versions: 0.9, 0.8.1
>         Environment: Linux
>            Reporter: Lukas Raska
>            Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> When starting Flink cluster via start-cluster.sh script, JobManager startup 
> can be delayed (as it's started asynchronously), which can result in failed 
> startup of several task managers.
> Solution is to wait certain amount of time and periodically check if RPC port 
> is accessible, then proceed with starting task managers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to