GitHub user NicoK opened a pull request:

    https://github.com/apache/flink/pull/4506

    [FLINK-7400][cluster] fix off-heap limits set to conservatively in cluster 
environments

    ## What is the purpose of the change
    
    Inside `ContaineredTaskManagerParameters`, since #3648, the `offHeapSize` 
is set to the amount of memory Flink will use off-heap which will be set as the 
value for `-XX:MaxDirectMemorySize` in various cases, e.g. YARN or Mesos. This 
does not account for any off-heap use by other components than Flink, e.g. 
RocksDB, other libraries, or the JVM itself.
    
    Please note that this affects at least all batch programs with the 
following options set (which do not make much sense for streaming):
    ```
    taskmanager.memory.off-heap=true
    taskmanager.memory.size=<any value>
    taskmanager.memory.preallocate=true
    ```
    If, instead, `taskmanager.memory.fraction` is used, programs may be safe 
due to https://issues.apache.org/jira/browse/FLINK-7401 but the actual 
additional buffer that we get from that may be too small, especially if RocksDB 
or other libraries using off-heap memory are used.
    
    This PR adds the `cutoff` from the 
`containerized.heap-cutoff-ratio`/`containerized.heap-cutoff-min` configuration 
parameters to `offHeapSize` as implied by the description of these two options.
    
    ## Brief change log
    
    - include the cut-off memory (removed from the container memory size for 
further calculations) into the off-heap part
    - add a unit test verifying the bug fix in a YARN environment
    
    ## Verifying this change
    
    This change added tests and can be verified as follows:
    
    - added `YARNSessionCapacitySchedulerITCase#perJobYarnClusterOffHeap()` 
test that validates that we have enough memory available and the bounds are not 
too strict
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes: memory calculations)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (JavaDocs)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NicoK/flink flink-7400

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4506
    
----
commit 60d40cde20686b4b1b2d15dc838b15ed0cd994cc
Author: Nico Kruber <n...@data-artisans.com>
Date:   2017-08-09T09:53:03Z

    [FLINK-7400][cluster] fix cut-off memory not used for off-heap reserve as 
intended
    
    + fix description of `containerized.heap-cutoff-ratio`

commit 4135a223288608444d324da333cfdd70117c796d
Author: Nico Kruber <n...@data-artisans.com>
Date:   2017-08-09T14:16:31Z

    [FLINK-7400][yarn] add an integration test for yarn container memory 
restrictions using off-heap memory

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to