[
https://issues.apache.org/jira/browse/FLINK-29985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Khachatryan updated FLINK-29985:
--------------------------------------
Description:
When TM is stopped by RM, its slot table is closed, causing all its slots to be
released.
However, when TM is stopped by SIGTERM (i.e. external resource manager), its
slot table is NOT closed.
When a slot is released, the associated resources are released as well, in
particular, MemoryManager.
MemoryManager might hold not only memory, but also arbitrary shared resources
(currently, PythonSharedResources and RocksDBSharedResources).
As of now, RocksDBSharedResources contains only ephemeral resources. Not sure
about PythonSharedResources, but likely it is associated with a separate
process.
That means that in standalone clusters, some resources might not be released.
was:
When a slot is released, the associated resources are released as well, in
particular, MemoryManager. MemoryManager might hold not only memory, but also
some arbitrary shared resources (currently, PythonSharedResources and
RocksDBSharedResources).
When TM is stopped by JManager, its slot table is closed, causing all its slot
to be released
When TM is stopped by SIGTERM (i.e. external resource manager), its slot table
is NOT closed.
That means that in standalone clusters, some resources might not be released.
As of now, RocksDBSharedResources contains only ephemeral resources.
Not sure about PythonSharedResources, but likely it is associated with a
separate process.
> TaskManager doesn't close SlotTable on SIGTERM
> ----------------------------------------------
>
> Key: FLINK-29985
> URL: https://issues.apache.org/jira/browse/FLINK-29985
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Task
> Affects Versions: 1.16.0, 1.15.3
> Reporter: Roman Khachatryan
> Priority: Major
>
> When TM is stopped by RM, its slot table is closed, causing all its slots to
> be released.
> However, when TM is stopped by SIGTERM (i.e. external resource manager), its
> slot table is NOT closed.
>
> When a slot is released, the associated resources are released as well, in
> particular, MemoryManager.
> MemoryManager might hold not only memory, but also arbitrary shared resources
> (currently, PythonSharedResources and RocksDBSharedResources).
> As of now, RocksDBSharedResources contains only ephemeral resources. Not sure
> about PythonSharedResources, but likely it is associated with a separate
> process.
> That means that in standalone clusters, some resources might not be released.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)