[
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481468#comment-17481468
]
Yun Gao commented on FLINK-18356:
---------------------------------
With some more observation, it seems (sorry I still have not got the final
result yet)
# Currently the test mostly failed in flink-table/flink-table-planner module.
The tests of this module contains two parts, the tests and the integration
tests. The failure always happens in the integration tests parts.
# In Azure there are two parallel surefire test processes. Since the
flink-table-planner module has set reuseForks = true, it means the same two
processes would be used to run all the integration tests. Thus if we do not
have correctly cleanup or some cases have memory leaking, the memory used would
keep increasing.
# By add some print statements to the watchdog process:
[https://github.com/apache/flink/pull/18486,] from the result
[https://dev.azure.com/gaoyunhaii/gaoyun-flink/_build/results?buildId=562&view=logs&j=43a593e7-535d-554b-08cc-244368da36b4&t=82d122c0-8bbf-56f3-4c0d-8e3d69630d0f]
it seems the total memory is indeed 7G as Dawid pointed out, and the memory
usage is keeping increasing.
# Fortunately the case could be reproduced locally: by first run _mvn clean
install_ then run {_}mvn -Dflink.forkCount=2 -Dcheckstyle.skip=true verify -pl
flink-table/flink-table-planner{_}, the memory of the two processes are keeping
increasing, the maximum memory required is 4G for each process. This is also
wired since we have limit the heap to 2G. By adding
-XX:NativeMemoryTracking=detail to the surefire plugin JVM options, the memory
tracking result at the end of the tests are as follows.
{code:java}
Native Memory Tracking:Total: reserved=5199583KB +38339KB, committed=3802831KB
+44371KB- Java Heap (reserved=2097152KB, committed=1575936KB)
(mmap: reserved=2097152KB, committed=1575936KB)
- Class (reserved=2342856KB +37546KB, committed=1534368KB
+42666KB)
(classes #193700 +5400)
(malloc=38856KB +682KB #351017 +7653)
(mmap: reserved=2304000KB +36864KB,
committed=1495512KB +41984KB)
- Thread (reserved=48453KB -969KB, committed=48453KB -969KB)
(thread #48 -1)
(stack: reserved=48188KB -1028KB, committed=48188KB
-1028KB)
(malloc=146KB -3KB #250 -5)
(arena=119KB +63 #90 -2)
- Code (reserved=287969KB +111KB, committed=244357KB
+1023KB)
(malloc=38369KB +111KB #69847 +655)
(mmap: reserved=249600KB, committed=205988KB +912KB)
- GC (reserved=146916KB +24KB, committed=127580KB +24KB)
(malloc=36324KB +24KB #148561 +842)
(mmap: reserved=110592KB, committed=91256KB)
- Compiler (reserved=442KB, committed=442KB)
(malloc=312KB #8705 +2)
(arena=131KB #7)
- Internal (reserved=177788KB +890KB, committed=177784KB
+890KB)
(malloc=177752KB +890KB #316112 +8088)
(mmap: reserved=36KB, committed=32KB)
- Symbol (reserved=43952KB +93KB, committed=43952KB +93KB)
(malloc=42122KB +93KB #393043 +1171)
(arena=1830KB #1)
- Native Memory Tracking (reserved=21378KB +675KB, committed=21378KB +675KB)
(malloc=1011KB +314KB #14713 +4635)
(tracking overhead=20367KB +361KB)
- Arena Chunk (reserved=28582KB -31KB, committed=28582KB -31KB)
(malloc=28582KB -31KB)
- Unknown (reserved=4096KB, committed=0KB)
(mmap: reserved=4096KB, committed=0KB)
{code}
It seems the heap part and the classes part contributes to most of the memory
consumption.
> Exit code 137 returned from process
> -----------------------------------
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
> Issue Type: Bug
> Components: Build System / Azure Pipelines, Tests
> Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0
> Reporter: Piotr Nowojski
> Assignee: Dawid Wysakowicz
> Priority: Blocker
> Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
>
> {noformat}
> ============================= test session starts
> ==============================
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py .......... [
> 1%]
> pyflink/common/tests/test_execution_config.py ....................... [
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker',
> arguments 'exec -i -u 1002
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3
--
This message was sent by Atlassian Jira
(v8.20.1#820001)