[ 
https://issues.apache.org/jira/browse/FLINK-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123744#comment-15123744
 ] 

ASF GitHub Bot commented on FLINK-3161:
---------------------------------------

Github user greghogan commented on the pull request:

    https://github.com/apache/flink/pull/1523#issuecomment-176858076
  
    New commit which passes FLINK_SSH_OPTS to pdsh.
    
    Also, some sample timings starting and stopping an AWS cluster of various 
sizes, with ssh and pdsh comparable on a single node and pdsh noticeably faster 
on large clusters.
    
    64 x c4.large | ssh | pdsh
    ------------- | --- | ----
    start | 13.969s | 4.210s
    stop | 12.533s | 4.181s
    start | 13.906s | 4.203s
    stop | 13.169s | 4.283s
    start | 14.122s | 4.262s
    stop | 12.343s | 4.196s
    
    16 x c4.large | ssh | pdsh
    ------------- | --- | ----
    start | 3.961s | 1.270s
    stop | 2.985s | 1.267s
    start | 3.638s | 1.277s
    stop | 3.014s | 1.164s
    start | 3.410s | 1.470s
    stop | 3.159s | 1.180s
    
    1 x c4.large | ssh | pdsh
    ------------- | --- | ----
    start | 0.439s | 0.543s
    stop | 1.247s | 0.449s
    start | 0.448s | 0.547s
    stop | 1.439s | 1.300s
    start | 0.439s | 0.542s
    stop | 0.827s | 0.452s


> Externalize cluster start-up and tear-down when available
> ---------------------------------------------------------
>
>                 Key: FLINK-3161
>                 URL: https://issues.apache.org/jira/browse/FLINK-3161
>             Project: Flink
>          Issue Type: Improvement
>          Components: Start-Stop Scripts
>    Affects Versions: 1.0.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
>
> I have been using pdsh, pdcp, and rpdcp to both distribute compiled Flink and 
> to start and stop the TaskManagers. The current shell script initializes 
> TaskManagers one-at-a-time. This is trivial to background but would be 
> unthrottled.
> From pdsh's archived homepage: "uses a sliding window of threads to execute 
> remote commands, conserving socket resources while allowing some connections 
> to timeout if needed".
> What other tools could be supported when available?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to