[
https://issues.apache.org/jira/browse/FLINK-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tzu-Li (Gordon) Tai updated FLINK-4020:
---------------------------------------
Description:
Currently FlinkKinesisConsumer is querying for the whole list of shards in the
constructor, forcing the client to be able to access Kinesis as well. This is
also a drawback for handling Kinesis-side resharding, since we'd want all shard
listing / shard-to-task assigning / shard end (result of resharding) handling
logic to be capable of being independently done within task life cycle methods,
with defined and definite results.
Main thing to overcome is coordination between parallel subtasks. All subtasks
will need to retry (due to Amazon's operation rate limits) until all subtasks
have succeeded. We could probably use either ZK or Amazon DynamoDB (user
configurable) for coordinating subtask status.
was:Currently FlinkKinesisConsumer is querying for the whole list of shards
in the constructor, forcing the client to be able to access Kinesis as well.
This is also a drawback for handling Kinesis-side resharding, since we'd want
all shard listing / shard-to-task assigning / shard end (result of resharding)
handling logic to be capable of being independently done within task life cycle
methods, with defined and definite results.
> Remove shard list querying from Kinesis consumer constructor
> ------------------------------------------------------------
>
> Key: FLINK-4020
> URL: https://issues.apache.org/jira/browse/FLINK-4020
> Project: Flink
> Issue Type: Sub-task
> Components: Streaming Connectors
> Reporter: Tzu-Li (Gordon) Tai
>
> Currently FlinkKinesisConsumer is querying for the whole list of shards in
> the constructor, forcing the client to be able to access Kinesis as well.
> This is also a drawback for handling Kinesis-side resharding, since we'd want
> all shard listing / shard-to-task assigning / shard end (result of
> resharding) handling logic to be capable of being independently done within
> task life cycle methods, with defined and definite results.
> Main thing to overcome is coordination between parallel subtasks. All
> subtasks will need to retry (due to Amazon's operation rate limits) until all
> subtasks have succeeded. We could probably use either ZK or Amazon DynamoDB
> (user configurable) for coordinating subtask status.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)