tweise commented on a change in pull request #6980: [FLINK-5697] [kinesis] Add
periodic per-shard watermark support
URL: https://github.com/apache/flink/pull/6980#discussion_r235149070
##########
File path:
flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java
##########
@@ -609,7 +667,115 @@ public int
registerNewSubscribedShardState(KinesisStreamShardState newSubscribed
this.numberOfActiveShards.incrementAndGet();
}
- return subscribedShardsState.size() - 1;
+ int shardStateIndex = subscribedShardsState.size() - 1;
+
+ // track all discovered shards for watermark
determination
+ ShardWatermarkState sws =
shardWatermarks.get(shardStateIndex);
+ if (sws == null) {
+ sws = new ShardWatermarkState();
+ try {
+ sws.periodicWatermarkAssigner =
InstantiationUtil.clone(periodicWatermarkAssigner);
+ } catch (Exception e) {
+ throw new RuntimeException(e);
+ }
+ sws.lastUpdated = getCurrentTimeMillis();
+ sws.lastRecordTimestamp = Long.MIN_VALUE;
+ shardWatermarks.put(shardStateIndex, sws);
+ }
+
+ return shardStateIndex;
+ }
+ }
+
+ /**
+ * Return the current system time. Allow tests to override this to
simulate progress for watermark
+ * logic.
+ *
+ * @return
+ */
+ @VisibleForTesting
+ protected long getCurrentTimeMillis() {
+ return System.currentTimeMillis();
+ }
+
+ /**
+ * Called periodically to emit a watermark. Checks all shards for the
current event time
+ * watermark, and possibly emits the next watermark.
+ *
+ * <p>Shards that have not received an update for a certain interval
are considered inactive so as
+ * to not hold back the watermark indefinitely. When all shards are
inactive, the subtask will be
+ * marked as temporarily idle to not block downstream operators.
+ */
+ @VisibleForTesting
+ protected void emitWatermark() {
+ LOG.debug(
+ "###evaluating watermark for subtask {} time {}",
+ indexOfThisConsumerSubtask,
+ getCurrentTimeMillis());
+ long potentialWatermark = Long.MAX_VALUE;
+ long idleTime =
+ (shardIdleIntervalMillis > 0)
+ ? getCurrentTimeMillis() -
shardIdleIntervalMillis
+ : Long.MAX_VALUE;
+
+ for (Map.Entry<Integer, ShardWatermarkState> e :
shardWatermarks.entrySet()) {
+ // consider only active shards, or those that would
advance the watermark
+ Watermark w =
e.getValue().periodicWatermarkAssigner.getCurrentWatermark();
+ if (w != null && (e.getValue().lastUpdated >= idleTime
|| w.getTimestamp() > lastWatermark)) {
+ potentialWatermark =
Math.min(potentialWatermark, w.getTimestamp());
+ }
+ }
+
+ // advance watermark if possible (watermarks can only be
ascending)
+ if (potentialWatermark == Long.MAX_VALUE) {
Review comment:
The potential watermark depends on the logic in the prior loop. The idle
condition should only be executed when there is no potential watermark.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services