[
https://issues.apache.org/jira/browse/FLINK-32532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740456#comment-17740456
]
Matthias Pohl edited comment on FLINK-32532 at 7/6/23 7:22 AM:
---------------------------------------------------------------
The failure you describe happened on agent {{AlibabaCI005-agent01}} on Jul 03
at 15:33:45. I checked the CI builds you reported in FLINK-18356. There is a
[137 exit code CI
failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50841&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4&l=11872]
(you reported it in [this
comment|https://issues.apache.org/jira/browse/FLINK-18356?focusedCommentId=17739727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17739727])
in the {{flink-table}} module on {{AlibabaCI005-agent04}} (i.e. same VM) on
Jul 3 at 15:32:38.
The 137 OOM errors make all the JVM processes crash on the same machine. We've
seen this in the past where there was always a CI build failing in
{{flink-table}} involved. That brought us to the conclusion that FLINK-18356 is
the most likely reason for the OOM. Therefore, you might want to close this
Jira issue as a duplicate of FLINK-18356 (it's important to link the Jiras to
make sure that we can trace back issues in case the OOM is not only caused by
FLINK-18356).
was (Author: mapohl):
The failure you describe happened on agent {{AlibabaCI005-agent01}} on Jul 03
at 15:33:45. I checked the CI builds you reported in FLINK-18356. There is a
[137 exit code CI
failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50841&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4&l=11872]
(you reported in [this
comment|https://issues.apache.org/jira/browse/FLINK-18356?focusedCommentId=17739727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17739727])
also in the table module on {{AlibabaCI005-agent04}} (i.e. same VM) on Jul 3
at 15:32:38.
The 137 OOM errors make all the JVM processes crash on the same machine. We've
seen this in the past where there was always a CI build failing in
{{flink-table}} involved. That brought us to the conclusion that FLINK-18356 is
the most likely reason for the OOM. Therefore, you might want to close this
Jira issue as a duplicate of FLINK-18356 (it's important to link the Jiras to
make sure that we can trace back issues in case the OOM is not only caused by
FLINK-18356).
> exit code 137 (i.e. OutOfMemoryError) in flink-s3-fs-hadoop module
> ------------------------------------------------------------------
>
> Key: FLINK-32532
> URL: https://issues.apache.org/jira/browse/FLINK-32532
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Hadoop Compatibility
> Affects Versions: 1.16.3
> Reporter: Sergey Nuyanzin
> Priority: Critical
> Labels: test-stability
>
> This build
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50840&view=logs&j=4eda0b4a-bd0d-521a-0916-8285b9be9bb5&t=2ff6d5fa-53a6-53ac-bff7-fa524ea361a9&l=16093
> is failing like
> {noformat}
> Jul 03 15:33:35 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time
> elapsed: 15.267 s - in org.apache.flink.fs.s3hadoop.HadoopS3FileSystemITCase
> Jul 03 15:33:45 [ERROR] Picked up JAVA_TOOL_OPTIONS:
> -XX:+HeapDumpOnOutOfMemoryError
> ##[error]Exit code 137 returned from process: file name '/bin/docker',
> arguments 'exec -i -u 1000 -w /home/agent01_azpcontainer
> 3e9ac5dd969222db5673644f5c729d323f624390f9dbc3238a1c99b1b3c4679b
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - connect_1
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)