[
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485095#comment-17485095
]
Yun Gao edited comment on FLINK-18356 at 2/1/22, 8:08 AM:
----------------------------------------------------------
By some more checks, the CompileUtils#COMPILED_CACHE holds the
SubmoduleClassLoader since it records (classloader, compiled generated class)
in its cache, most of the classloader is
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader,
it refers the SubmoduleClassLoader via acc: java.security.AccessControlContext
-> context: java.security.ProtectDomain[] -> one protection domain is the
sources of /tmp/flink-rpc-akka_7fe1ccfe-c36e-4aae-acd0-1c1922bf014f.jar loaded
by SubmoduleContextClassLoader.
For the following steps,
1. for the network buffers, I think perhaps it is not a realistic issue since
for all the deployment mode the network buffer pool would only be created once.
2. For the SubmoduleClassLoader problem (whether we need to share it or if
CompileUtils#COMPILED_CACHE refers to it is expected), perhaps [~chesnay] could
have a double look~?
3. For the CompileUtils#COMPILED_CACHE and other static fields problem, I'll
open new issues and we might need to continue to confirm if they would hold the
user classloader (and cause class leak) in realistic. If so, we might need to
solve them in consideration of the OLAP requirements.
4. We could also optimize the TableEnvironmentITCase to share the same
minicluster, or we might have a look if it is possible to share the
mini-cluster across multiple tests?
was (Author: gaoyunhaii):
By some more checks, the CompileUtils#COMPILED_CACHE holds the
SubmoduleClassLoader since it records (classloader, compiled generated class)
in its cache, most of the classloader is
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader,
it refers the SubmoduleClassLoader via acc: java.security.AccessControlContext
-> context: java.security.ProtectDomain[] -> one protection domain is the
sources of /tmp/flink-rpc-akka_7fe1ccfe-c36e-4aae-acd0-1c1922bf014f.jar loaded
by SubmoduleContextClassLoader.
For the following steps,
1. for the network buffers, I think perhaps it is not a realistic issue since
for all the deployment mode the network buffer pool would only be created once.
2. For the SubmoduleClassLoader problem (whether we need to share it or if
CompileUtils#COMPILED_CACHE refers to it is expected), perhaps [~chesnay] could
have a double look~?
3. For the CompileUtils#COMPILED_CACHE and other static fields problem, I'll
open new issues and we might need to continue to confirm if they would hold the
user classloader (and cause class leak) in realistic. If so, we might need to
solve them in consideration of the OLAP requirements.
> Exit code 137 returned from process
> -----------------------------------
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
> Issue Type: Bug
> Components: Build System / Azure Pipelines, Tests
> Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0
> Reporter: Piotr Nowojski
> Assignee: Chesnay Schepler
> Priority: Blocker
> Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
> Attachments: 1234.jpg, app-profiling_4.gif
>
>
> {noformat}
> ============================= test session starts
> ==============================
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py .......... [
> 1%]
> pyflink/common/tests/test_execution_config.py ....................... [
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker',
> arguments 'exec -i -u 1002
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3
--
This message was sent by Atlassian Jira
(v8.20.1#820001)