[
https://issues.apache.org/jira/browse/FLINK-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667327#comment-15667327
]
ASF GitHub Bot commented on FLINK-5056:
---------------------------------------
Github user zentol commented on a diff in the pull request:
https://github.com/apache/flink/pull/2797#discussion_r88016998
--- Diff:
flink-streaming-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java
---
@@ -264,27 +282,23 @@
//
-------------------------------------------ยง-------------------------------------------------
/**
- * Our subtask index, retrieved from the {@code RuntimeContext} in
{@link #open}.
- */
- private transient int subtaskIndex;
-
- /**
- * We use reflection to get the .truncate() method, this is only
available starting with
- * Hadoop 2.7
+ * We use reflection to get the .truncate() method, this is only
available starting with Hadoop 2.7
*/
private transient Method refTruncate;
/**
- * The state object that is handled by flink from snapshot/restore. In
there we store state for
- * every open bucket: the current part file path, the valid length of
the in-progress files and
- * pending part files.
+ * The state object that is handled by Flink from snapshot/restore.
This contains state for
+ * every open bucket: the current {@code in-progress} part file path,
its valid length and
+ * the {@code pending} part files.
--- End diff --
why the annotation around pending?
> BucketingSink deletes valid data when checkpoint notification is slow.
> ----------------------------------------------------------------------
>
> Key: FLINK-5056
> URL: https://issues.apache.org/jira/browse/FLINK-5056
> Project: Flink
> Issue Type: Bug
> Components: filesystem-connector
> Affects Versions: 1.1.3
> Reporter: Kostas Kloudas
> Assignee: Kostas Kloudas
> Fix For: 1.2.0
>
>
> Currently if BucketingSink receives no data after a checkpoint and then a
> notification about a previous checkpoint arrives, it clears its state. This
> can
> lead to not committing valid data about intermediate checkpoints for whom
> a notification has not arrived yet. As a simple sequence that illustrates the
> problem:
> -> input data
> -> snapshot(0)
> -> input data
> -> snapshot(1)
> -> no data
> -> notifyCheckpointComplete(0)
> the last will clear the state of the Sink without committing as final the
> data
> that arrived for checkpoint 1.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)