[
https://issues.apache.org/jira/browse/FLINK-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann updated FLINK-5063:
---------------------------------
Description:
In case that a {{Checkpoint}} is declined or expires, the
{{CheckpointCoordinator}} will dispose the {{PendingCheckpoint}}. Disposing the
{{PendingCheckpoint}} entails that all so far registered {{SubtaskStates}} of
the acknowledged {{Tasks}} are discarded. However, all late arriving
acknowledge messages are simply ignored without properly discarding the
transmitted state handles. This can lead to a cluttering of checkpoint
directory since the checkpoint files of late or unknown acknowledge checkpoint
messages are never deleted.
I propose to properly discard the state handles at the
{{CheckpointCoordinator}} if receiving a late acknowledge message or an
acknowledge message for an unknown {{ExecutionAttemptID}} belonging to the job
of the {{CheckpointCoordinator}}. However, checkpoint messages belonging to a
different job won't be handled and simply ignored.
was:
In case that a {{Checkpoint}} is declined or expires, the
{{CheckpointCoordinator}} will dispose the {{PendingCheckpoint}}. Disposing the
{{PendingCheckpoint}} entails that all so far registered {{SubtaskStates}} of
the acknowledged {{Tasks}} are discarded. However, all late arriving
acknowledge messages are simply ignored without properly discard the
transmitted state handles. This can lead to a cluttering of checkpoint
directory since the checkpoint files of late or unknown acknowledge checkpoint
messages are never deleted.
I propose to properly discard the state handles at the
{{CheckpointCoordinator}} if receiving a late acknowledge message or an
acknowledge message for an unknown {{ExecutionAttemptID}} belonging to the job
of the {{CheckpointCoordinator}}. However, checkpoint messages belonging to a
different job won't be handled and simply ignored.
> State handles are not properly cleaned up for declined or expired checkpoints
> -----------------------------------------------------------------------------
>
> Key: FLINK-5063
> URL: https://issues.apache.org/jira/browse/FLINK-5063
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.2.0, 1.1.3
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Critical
> Fix For: 1.2.0, 1.1.4
>
>
> In case that a {{Checkpoint}} is declined or expires, the
> {{CheckpointCoordinator}} will dispose the {{PendingCheckpoint}}. Disposing
> the {{PendingCheckpoint}} entails that all so far registered
> {{SubtaskStates}} of the acknowledged {{Tasks}} are discarded. However, all
> late arriving acknowledge messages are simply ignored without properly
> discarding the transmitted state handles. This can lead to a cluttering of
> checkpoint directory since the checkpoint files of late or unknown
> acknowledge checkpoint messages are never deleted.
> I propose to properly discard the state handles at the
> {{CheckpointCoordinator}} if receiving a late acknowledge message or an
> acknowledge message for an unknown {{ExecutionAttemptID}} belonging to the
> job of the {{CheckpointCoordinator}}. However, checkpoint messages belonging
> to a different job won't be handled and simply ignored.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)