Regarding the two hooks you would like to be available:

Provide hook to override discovery (not to hit Kinesis from every subtask)
Yes, I think we can easily provide a way, for example setting -1 for 
SHARD_DISCOVERY_INTERVAL_MILLIS, to disable shard discovery.
Though, the user would then have to savepoint and restore in order to pick up 
new shards after a Kinesis stream reshard (which is in practice the best way to 
by-pass the Kinesis API rate limitations).
+1 to provide that.

Provide hook to support custom watermark generation (somewhere around 
KinesisDataFetcher.emitRecordAndUpdateState)
Per-partition watermark generation on the Kinesis side is slightly more complex 
than Kafka, due to how Kinesis’s dynamic resharding works.
I think we need to additionally allow new shards to be consumed only after its 
parent shard is fully read, otherwise “per-shard time characteristics” can be 
broken because of this out-of-orderness consumption across the boundaries of a 
closed parent shard and its child.
There theses JIRAs [1][2] which has a bit more details on the topic.
Otherwise, in general I’m also +1 to providing this also in the Kinesis 
consumer.

Cheers,
Gordon

[1] https://issues.apache.org/jira/browse/FLINK-5697
[2] https://issues.apache.org/jira/browse/FLINK-6349

On 8 February 2018 at 1:48:23 AM, Thomas Weise (t...@apache.org) wrote:

Provide hook to override discovery (not to hit Kinesis from every subtask)
Provide hook to support custom watermark generation (somewhere around 
KinesisDataFetcher.emitRecordAndUpdateState)

Reply via email to