[jira] [Commented] (FLINK-3231) Handle Kinesis-side resharding in Kinesis streaming consumer

ASF GitHub Bot (JIRA) Mon, 27 Jun 2016 22:13:39 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352370#comment-15352370
 ]


ASF GitHub Bot commented on FLINK-3231:
---------------------------------------

Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2131#discussion_r68698386
  
    --- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
    @@ -86,32 +144,25 @@ public KinesisProxy(Properties configProps) {
         * @param maxRecordsToGet the maximum amount of records to retrieve for 
this batch
         * @return the batch of retrieved records
         */
    -   public GetRecordsResult getRecords(String shardIterator, int 
maxRecordsToGet) {
    +   @Override
    +   public GetRecordsResult getRecords(String shardIterator, int 
maxRecordsToGet) throws InterruptedException {
                final GetRecordsRequest getRecordsRequest = new 
GetRecordsRequest();
                getRecordsRequest.setShardIterator(shardIterator);
                getRecordsRequest.setLimit(maxRecordsToGet);
     
                GetRecordsResult getRecordsResult = null;
     
    -           int remainingRetryTimes = Integer.valueOf(
    -                   
configProps.getProperty(KinesisConfigConstants.CONFIG_STREAM_DESCRIBE_RETRIES, 
Integer.toString(KinesisConfigConstants.DEFAULT_STREAM_DESCRIBE_RETRY_TIMES)));
    -           long describeStreamBackoffTimeInMillis = Long.valueOf(
    -                   
configProps.getProperty(KinesisConfigConstants.CONFIG_STREAM_DESCRIBE_BACKOFF, 
Long.toString(KinesisConfigConstants.DEFAULT_STREAM_DESCRIBE_BACKOFF)));
    -
    -           int i=0;
    -           while (i <= remainingRetryTimes && getRecordsResult == null) {
    +           int attempt = 0;
    +           while (attempt <= getRecordsMaxAttempts && getRecordsResult == 
null) {
                        try {
                                getRecordsResult = 
kinesisClient.getRecords(getRecordsRequest);
                        } catch (ProvisionedThroughputExceededException ex) {
    +                           long backoffMillis = fullJitterBackoff(
    +                                   getRecordsBaseBackoffMillis, 
getRecordsMaxBackoffMillis, getRecordsExpConstant, attempt++);
                                LOG.warn("Got 
ProvisionedThroughputExceededException. Backing off for "
    -                                   + describeStreamBackoffTimeInMillis + " 
millis.");
    -                           try {
    -                                   
Thread.sleep(describeStreamBackoffTimeInMillis);
    -                           } catch (InterruptedException interruptEx) {
    -                                   //
    -                           }
    +                                   + backoffMillis + " millis.");
    +                           Thread.sleep(backoffMillis);
                        }
    -                   i++;
                }
     
                if (getRecordsResult == null) {
    --- End diff --
    
    My reasoning for failing hard here and not in the `describeStream()` 
operation for limit / provision throughput exceeded exceptions is that I think 
rate exceeding for `getRecords()` on a single shard may be cause by numerous 
reasons that the user should handle.
    1. The user might have other non-Flink concurrent consumers reading the 
same shard (Kinesis limits to 5 concurrent reads per shard.) The user should 
either slow down the non-Flink consumers, or slow down our `ShardConsumer` by 
adjusting the `KinesisConfigConstants.CONFIG_SHARD_GETRECORDS_IDLE_MILLIS` we 
will be introducing in https://github.com/apache/flink/pull/2071.
    2. The size of each record is too large, and the user needs to adjust 
`KinesisConfigConstants.CONFIG_SHARD_GETRECORDS_MAX` to a smaller value.
    
    `describeStream()` on the other hand is an internal operation we need to 
continuously do, and make sure we get results for the connector to work.


> Handle Kinesis-side resharding in Kinesis streaming consumer
> ------------------------------------------------------------
>
>                 Key: FLINK-3231
>                 URL: https://issues.apache.org/jira/browse/FLINK-3231
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Kinesis Connector, Streaming Connectors
>    Affects Versions: 1.1.0
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>             Fix For: 1.1.0
>
>
> A big difference between Kinesis shards and Kafka partitions is that Kinesis 
> users can choose to "merge" and "split" shards at any time for adjustable 
> stream throughput capacity. This article explains this quite clearly: 
> https://brandur.org/kinesis-by-example.
> This will break the static shard-to-task mapping implemented in the basic 
> version of the Kinesis consumer 
> (https://issues.apache.org/jira/browse/FLINK-3229). The static shard-to-task 
> mapping is done in a simple round-robin-like distribution which can be 
> locally determined at each Flink consumer task (Flink Kafka consumer does 
> this too).
> To handle Kinesis resharding, we will need some way to let the Flink consumer 
> tasks coordinate which shards they are currently handling, and allow the 
> tasks to ask the coordinator for a shards reassignment when the task finds 
> out it has found a closed shard at runtime (shards will be closed by Kinesis 
> when it is merged and split).
> We need a centralized coordinator state store which is visible to all Flink 
> consumer tasks. Tasks can use this state store to locally determine what 
> shards it can be reassigned. Amazon KCL uses a DynamoDB table for the 
> coordination, but as described in 
> https://issues.apache.org/jira/browse/FLINK-3211, we unfortunately can't use 
> KCL for the implementation of the consumer if we want to leverage Flink's 
> checkpointing mechanics. For our own implementation, Zookeeper can be used 
> for this state store, but that means it would require the user to set up ZK 
> to work.
> Since this feature introduces extensive work, it is opened as a separate 
> sub-task from the basic implementation 
> https://issues.apache.org/jira/browse/FLINK-3229.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3231) Handle Kinesis-side resharding in Kinesis streaming consumer

Reply via email to