[
https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094156#comment-16094156
]
Bowen Li commented on FLINK-6365:
---------------------------------
Ok. For {{SHARD_GETRECORDS_MAX}}, {{10,000}} it is, since we all agree to the
value. We tested it in our prod environment, and it works well by greatly
reducing # requests to Kinesis.
For {{SHARD_GETRECORDS_INTERVAL}}, I second [~sthm]'s proposal. Practically, I
set that value of our prod Flink job to be 2,000ms (yes, 2sec), because 0ms
exploded our 36-shards kinesis stream and setting {{SHARD_GETRECORDS_MAX}} as
10,000 makes up for the longer interval. I'm also evaluating it theoretically
for its relationship to {{# parallelism of Flink datasource stream}} (1) and
{{# shards in kinesis stream}} (2).
* When (1) = (2), 1 parallel Flink source operation reads from exactly 1
kinesis shard. So 200ms is much better than 0ms, because 200ms makes Flink
source read at max speed without exceeding read capacity.
* When (1) > (2), some (or all) kinesis shards are read by more than 1 parallel
Flink source. 200ms is still better than 0ms, because a) 200ms guarantees a
shard receives at least 5requests/sec if that shard is read by 1 Flink source,
and b) 200ms can greatly lower # requests if that shard is read by more than 1
Flink source, and lower Flink's read latency
* When (1) < (2), some (or all) Flink sources read from more than 1 kinesis
shard. 200ms probably cannot unleash some shards' potential, and a shorter time
seems more reasonable. However, 0ms is still too intense.
In short, 200ms at least makes Flink work, and 0ms is not. Besides, given that
Steffen works for AWS, I put more weight on his opinion.
> Adapt default values of the Kinesis connector
> ---------------------------------------------
>
> Key: FLINK-6365
> URL: https://issues.apache.org/jira/browse/FLINK-6365
> Project: Flink
> Issue Type: Improvement
> Components: Kinesis Connector
> Affects Versions: 1.2.0
> Reporter: Steffen Hausmann
> Assignee: Bowen Li
> Priority: Minor
> Fix For: 1.4.0, 1.3.2
>
>
> As discussed in
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html,
> it seems reasonable to change the default values of the Kinesis connector to
> follow KCL’s default settings. I suggest to adapt at least the values for
> SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS.
> As a Kinesis shard is currently limited to 5 get operations per second, you
> can observe high ReadProvisionedThroughputExceeded rates with the current
> default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable
> to increase it to 200. As it's described in the email thread, it seems
> furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX
> to 10000.
> The values that are used by the KCL can be found here:
> https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java
> Thanks for looking into this!
> Steffen
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)