[
https://issues.apache.org/jira/browse/FLINK-13962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhu Zhu updated FLINK-13962:
----------------------------
Description:
Currently the taskRestore field of an _Execution_ is reset to null in task
deployment stage.
The purpose of it is "allows the JobManagerTaskRestore instance to be garbage
collected. Furthermore, it won't be archived along with the Execution in the
ExecutionVertex in case of a restart. This is especially important when setting
state.backend.fs.memory-threshold to larger values because every state below
this threshold will be stored in the meta state files and, thus, also the
JobManagerTaskRestore instances." (From FLINK-9693)
However, if a task fails before it comes to the deployment stage(e.g. fails due
to slot allocation timeout), the _taskRestore_ field will remain non-null and
will be archived in prior executions.
This may result in large JM heap cost in certain cases and lead to continuous
JM full GCs.
I’d propose to set the _taskRestore_ field to be null before moving an
_Execution_ to prior executions.
We may keep the logic which sets the _taskRestore_ field to be null after task
deployment which allows it to be GC'ed earlier in normal cases.
was:
Currently the taskRestore field of an _Execution_ is reset to null in task
deployment stage.
The purpose of it is "allows the JobManagerTaskRestore instance to be garbage
collected. Furthermore, it won't be archived along with the Execution in the
ExecutionVertex in case of a restart. This is especially important when setting
state.backend.fs.memory-threshold to larger values because every state below
this threshold will be stored in the meta state files and, thus, also the
JobManagerTaskRestore instances." (From FLINK-9693)
However, if a task fails before it comes to the deployment stage(e.g. fails due
to slot allocation timeout), the _taskRestore_ field will remain non-null and
will be archived in prior executions.
This may result in large JM heap cost in certain cases and lead to continuous
JM full GCs.
I’d propose to set the _taskRestore_ field to be null before moving an
_Execution_ to prior executions.
We may keep the logic which sets the _taskRestore_ field to be null after task
deployment to allow GC of it in normal cases.
> Task state handles leak if the task fails before deploying
> ----------------------------------------------------------
>
> Key: FLINK-13962
> URL: https://issues.apache.org/jira/browse/FLINK-13962
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.9.0, 1.10.0
> Reporter: Zhu Zhu
> Priority: Major
>
> Currently the taskRestore field of an _Execution_ is reset to null in task
> deployment stage.
> The purpose of it is "allows the JobManagerTaskRestore instance to be garbage
> collected. Furthermore, it won't be archived along with the Execution in the
> ExecutionVertex in case of a restart. This is especially important when
> setting state.backend.fs.memory-threshold to larger values because every
> state below this threshold will be stored in the meta state files and, thus,
> also the JobManagerTaskRestore instances." (From FLINK-9693)
>
> However, if a task fails before it comes to the deployment stage(e.g. fails
> due to slot allocation timeout), the _taskRestore_ field will remain non-null
> and will be archived in prior executions.
> This may result in large JM heap cost in certain cases and lead to continuous
> JM full GCs.
>
> I’d propose to set the _taskRestore_ field to be null before moving an
> _Execution_ to prior executions.
> We may keep the logic which sets the _taskRestore_ field to be null after
> task deployment which allows it to be GC'ed earlier in normal cases.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)