[
https://issues.apache.org/jira/browse/FLINK-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839317#comment-15839317
]
ASF GitHub Bot commented on FLINK-4616:
---------------------------------------
Github user tzulitai commented on a diff in the pull request:
https://github.com/apache/flink/pull/3031#discussion_r97935696
--- Diff:
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/AbstractFetcher.java
---
@@ -175,34 +176,115 @@ protected AbstractFetcher(
//
------------------------------------------------------------------------
/**
- * Takes a snapshot of the partition offsets.
+ * Takes a snapshot of the partition offsets and watermarks.
*
* <p>Important: This method mus be called under the checkpoint lock.
*
- * @return A map from partition to current offset.
+ * @return A map from partition to current offset and watermark.
*/
- public HashMap<KafkaTopicPartition, Long> snapshotCurrentState() {
+ public HashMap<KafkaTopicPartition, Tuple2<Long, Long>>
snapshotCurrentState() {
// this method assumes that the checkpoint lock is held
assert Thread.holdsLock(checkpointLock);
- HashMap<KafkaTopicPartition, Long> state = new
HashMap<>(allPartitions.length);
- for (KafkaTopicPartitionState<?> partition :
subscribedPartitions()) {
- state.put(partition.getKafkaTopicPartition(),
partition.getOffset());
+ HashMap<KafkaTopicPartition, Tuple2<Long, Long>> state = new
HashMap<>(allPartitions.length);
+
+ switch (timestampWatermarkMode) {
+
+ case NO_TIMESTAMPS_WATERMARKS: {
+
+ for (KafkaTopicPartitionState<KPH> partition :
allPartitions) {
+
state.put(partition.getKafkaTopicPartition(), Tuple2.of(partition.getOffset(),
Long.MIN_VALUE));
+ }
+
+ return state;
+ }
+
+ case PERIODIC_WATERMARKS: {
+
KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH> [] partitions =
+
(KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH> []) allPartitions;
+
+ for
(KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH> partition : partitions)
{
+
state.put(partition.getKafkaTopicPartition(), Tuple2.of(partition.getOffset(),
partition.getCurrentWatermarkTimestamp()));
+ }
+
+ return state;
+ }
+
+ case PUNCTUATED_WATERMARKS: {
+
KafkaTopicPartitionStateWithPunctuatedWatermarks<T, KPH> [] partitions =
+
(KafkaTopicPartitionStateWithPunctuatedWatermarks<T, KPH> []) allPartitions;
+
+ for
(KafkaTopicPartitionStateWithPunctuatedWatermarks<T, KPH> partition :
partitions) {
+
state.put(partition.getKafkaTopicPartition(), Tuple2.of(partition.getOffset(),
partition.getCurrentPartitionWatermark()));
+ }
+
+ return state;
+ }
+
+ default:
+ // cannot happen, add this as a guard for the
future
+ throw new RuntimeException();
--- End diff --
Would be good to have a reason message here.
> Kafka consumer doesn't store last emmited watermarks per partition in state
> ---------------------------------------------------------------------------
>
> Key: FLINK-4616
> URL: https://issues.apache.org/jira/browse/FLINK-4616
> Project: Flink
> Issue Type: Bug
> Components: Kafka Connector
> Affects Versions: 1.1.1
> Reporter: Yuri Makhno
> Assignee: Roman Maier
>
> Kafka consumers stores in state only kafka offsets and doesn't store last
> emmited watermarks, this may go to wrong state when checkpoint is restored:
> Let's say our watermark is (timestamp - 10) and in case we have the following
> messages queue results will be different after checkpoint restore and during
> normal processing:
> A(ts = 30)
> B(ts = 35)
> ------ checkpoint goes here
> C(ts=15) -- this one should be filtered by next time window
> D(ts=60)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)