[jira] [Commented] (FLINK-3163) Configure Flink for NUMA systems

ASF GitHub Bot (JIRA) Wed, 01 Feb 2017 09:26:59 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848661#comment-15848661
 ]


ASF GitHub Bot commented on FLINK-3163:
---------------------------------------

GitHub user greghogan opened a pull request:

    https://github.com/apache/flink/pull/3249

    [FLINK-3163] [scripts] Configure Flink for NUMA systems

    Start a TaskManager on each NUMA node on each worker when the new 
configuration option 'taskmanager.compute.numa' is enabled.
    
    This does not affect the runtime process for the JobManager (or future 
ResourceManager) as the startup scripts do not provide a simple means of 
disambiguating masters and slaves. I expect most large clusters to run these 
master processes on separate machines, and for small clusters the JobManager 
can run alongside a TaskManager.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/greghogan/flink 
3163_configure_flink_for_numa_systems

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3249
    
----
commit 57767e67dc7306d18df07d5224c81a8d359df620
Author: Greg Hogan <[email protected]>
Date:   2017-02-01T17:13:49Z

    [FLINK-3163] [scripts] Configure Flink for NUMA systems
    
    Start a TaskManager on each NUMA node on each worker when the new
    configuration option 'taskmanager.compute.numa' is enabled.

----


> Configure Flink for NUMA systems
> --------------------------------
>
>                 Key: FLINK-3163
>                 URL: https://issues.apache.org/jira/browse/FLINK-3163
>             Project: Flink
>          Issue Type: Improvement
>          Components: Startup Shell Scripts
>    Affects Versions: 1.0.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>
> On NUMA systems Flink can be pinned to a single physical processor ("node") 
> using {{numactl --membind=$node --cpunodebind=$node <command>}}. Commonly 
> available NUMA systems include the largest AWS and Google Compute instances.
> For example, on an AWS c4.8xlarge system with 36 hyperthreads the user could 
> configure a single TaskManager with 36 slots or have Flink create two 
> TaskManagers bound to each of the NUMA nodes, each with 18 slots.
> There may be some extra overhead in transferring network buffers between 
> TaskManagers on the same system, though the fraction of data shuffled in this 
> manner decreases with the size of the cluster. The performance improvement 
> from only accessing local memory looks to be significant though difficult to 
> benchmark.
> The JobManagers may fit into NUMA nodes rather than requiring full systems.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-3163) Configure Flink for NUMA systems

Reply via email to