[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485095#comment-17485095
 ] 

Yun Gao edited comment on FLINK-18356 at 2/1/22, 8:08 AM:
----------------------------------------------------------

By some more checks, the CompileUtils#COMPILED_CACHE holds the 
SubmoduleClassLoader since it records (classloader, compiled generated class) 
in its cache, most of the classloader is 
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader,
 it refers the SubmoduleClassLoader via acc: java.security.AccessControlContext 
-> context: java.security.ProtectDomain[] -> one protection domain is the 
sources of  /tmp/flink-rpc-akka_7fe1ccfe-c36e-4aae-acd0-1c1922bf014f.jar loaded 
by SubmoduleContextClassLoader. 

For the following steps,
1. for the network buffers, I think perhaps it is not a realistic issue since 
for all the deployment mode the network buffer pool would only be created once. 
2. For the SubmoduleClassLoader problem (whether we need to share it or if 
CompileUtils#COMPILED_CACHE refers to it is expected), perhaps [~chesnay] could 
have a double look~?
3. For the CompileUtils#COMPILED_CACHE and other static fields problem, I'll 
open new issues and we might need to continue to confirm if they would hold the 
user classloader (and cause class leak) in realistic. If so, we might need to 
solve them in consideration of the OLAP requirements.
4. We could also optimize the TableEnvironmentITCase to share the same 
minicluster, or we might have a look if it is possible to share the 
mini-cluster across multiple tests? 


was (Author: gaoyunhaii):
By some more checks, the CompileUtils#COMPILED_CACHE holds the 
SubmoduleClassLoader since it records (classloader, compiled generated class) 
in its cache, most of the classloader is 
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader,
 it refers the SubmoduleClassLoader via acc: java.security.AccessControlContext 
-> context: java.security.ProtectDomain[] -> one protection domain is the 
sources of  /tmp/flink-rpc-akka_7fe1ccfe-c36e-4aae-acd0-1c1922bf014f.jar loaded 
by SubmoduleContextClassLoader. 

For the following steps,
1. for the network buffers, I think perhaps it is not a realistic issue since 
for all the deployment mode the network buffer pool would only be created once. 
2. For the SubmoduleClassLoader problem (whether we need to share it or if 
CompileUtils#COMPILED_CACHE refers to it is expected), perhaps [~chesnay] could 
have a double look~?
3. For the CompileUtils#COMPILED_CACHE and other static fields problem, I'll 
open new issues and we might need to continue to confirm if they would hold the 
user classloader (and cause class leak) in realistic. If so, we might need to 
solve them in consideration of the OLAP requirements.

> Exit code 137 returned from process
> -----------------------------------
>
>                 Key: FLINK-18356
>                 URL: https://issues.apache.org/jira/browse/FLINK-18356
>             Project: Flink
>          Issue Type: Bug
>          Components: Build System / Azure Pipelines, Tests
>    Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0
>            Reporter: Piotr Nowojski
>            Assignee: Chesnay Schepler
>            Priority: Blocker
>              Labels: pull-request-available, test-stability
>             Fix For: 1.15.0
>
>         Attachments: 1234.jpg, app-profiling_4.gif
>
>
> {noformat}
> ============================= test session starts 
> ==============================
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..........                    [  
> 1%]
> pyflink/common/tests/test_execution_config.py .......................    [  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to