[ 
https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093722#comment-16093722
 ] 

Steffen Hausmann commented on FLINK-6365:
-----------------------------------------

I agree to adapting SHARD_GETRECORDS_MAX, but I would still argue that it would 
decrease the latency if the connector only polls every 200 ms. 

Kinesis supports 5 getrecords requests per second, and to maintain low latency 
it seems desirable to make this request every 200 ms. Correct me if I'm wrong, 
but what I believe happens right now with a default value for 
SHARD_GETRECORDS_MAX of 0 is as follows. The first five getrecords requests 
will be successful. However, as the connector will make these requests as 
quickly as possible, chances are that they fall in the beginning of the 1 
second interval. So for subsequent request in the 1 second interval, 
exponential backoff will wait for some time and retry the request and chances 
are small that this request will be made exactly 1 seconds after the first 
request. So either the request gets throttled again or if it's successful but 
not made as quickly as it could have been.

To introduce as little latency as possible, the connector would wait exactly 
200 ms between two getrecords calls. Looking at the code, it seems like the 
current implementation will read the records from the stream and then wait 
SHARD_GETRECORDS_INTERVAL. So setting SHARD_GETRECORDS_INTERVAL to 200 will 
cause some additional latency beyond the 200 ms (namely, the time it takes to 
read the records). But even with this implementation, that can be further 
optimized, I would argue that it's desirable to increase 
SHARD_GETRECORDS_INTERVAL to 200.

> Adapt default values of the Kinesis connector
> ---------------------------------------------
>
>                 Key: FLINK-6365
>                 URL: https://issues.apache.org/jira/browse/FLINK-6365
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kinesis Connector
>    Affects Versions: 1.2.0
>            Reporter: Steffen Hausmann
>            Assignee: Bowen Li
>            Priority: Minor
>             Fix For: 1.4.0, 1.3.2
>
>
> As discussed in 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html,
>  it seems reasonable to change the default values of the Kinesis connector to 
> follow KCL’s default settings. I suggest to adapt at least the values for 
> SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS. 
> As a Kinesis shard is currently limited to 5 get operations per second, you 
> can observe high ReadProvisionedThroughputExceeded rates with the current 
> default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable 
> to increase it to 200. As it's described in the email thread, it seems 
> furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX 
> to 10000.
> The values that are used by the KCL can be found here: 
> https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java
> Thanks for looking into this!
> Steffen



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to