Dian Fu created FLINK-21552:
-------------------------------

             Summary: The managed memory was not released if exception was 
thrown in createPythonExecutionEnvironment
                 Key: FLINK-21552
                 URL: https://issues.apache.org/jira/browse/FLINK-21552
             Project: Flink
          Issue Type: Bug
          Components: API / Python
    Affects Versions: 1.12.0
            Reporter: Dian Fu
            Assignee: Dian Fu
             Fix For: 1.12.3


If there is exception thrown in createPythonExecutionEnvironment, the job will 
failed with the following exception:
{code}
org.apache.flink.runtime.memory.MemoryAllocationException: Could not created 
the shared memory resource of size 611948962. Not enough memory left to reserve 
from the slot's managed memory.
at 
org.apache.flink.runtime.memory.MemoryManager.lambda$getSharedMemoryResourceForManagedMemory$5(MemoryManager.java:536)
at 
org.apache.flink.runtime.memory.SharedResources.createResource(SharedResources.java:126)
at 
org.apache.flink.runtime.memory.SharedResources.getOrAllocateSharedResource(SharedResources.java:72)
at 
org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:555)
at 
org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:250)
at 
org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:113)
at 
org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:116)
at 
org.apache.flink.table.runtime.operators.python.scalar.AbstractPythonScalarFunctionOperator.open(AbstractPythonScalarFunctionOperator.java:88)
at 
org.apache.flink.table.runtime.operators.python.scalar.AbstractRowDataPythonScalarFunctionOperator.open(AbstractRowDataPythonScalarFunctionOperator.java:70)
at 
org.apache.flink.table.runtime.operators.python.scalar.RowDataPythonScalarFunctionOperator.open(RowDataPythonScalarFunctionOperator.java:59)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:428)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:543)
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:93)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:533)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
at java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.flink.runtime.memory.MemoryReservationException: Could 
not allocate 611948962 bytes, only 0 bytes are remaining. This usually 
indicates that you are requesting more memory than you have reserved. However, 
when running an old JVM version it can also be caused by slow garbage 
collection. Try to upgrade to Java 8u72 or higher if running on an old Java 
version.
at 
org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:170)
at 
org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:84)
at 
org.apache.flink.runtime.memory.MemoryManager.reserveMemory(MemoryManager.java:423)
at 
org.apache.flink.runtime.memory.MemoryManager.lambda$getSharedMemoryResourceForManagedMemory$5(MemoryManager.java:534)
... 17 more
{code}

The reason is that the reserved managed memory was not added back to the 
MemoryManager when Job failed because of exceptions thrown in 
createPythonExecutionEnvironment. This causes that there is no managed memory 
to allocate during failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to