[ 
https://issues.apache.org/jira/browse/FLINK-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119222#comment-16119222
 ] 

Bowen Li commented on FLINK-7223:
---------------------------------

I'm actually fine with the default value, since we are already aware of this 
issue, and we override the default value ourself. But from a enterprise user's 
point of view, the assumption of 'this means running up to 50 Flink jobs per 
account' is not practical at all for a big AWS enterprise customer. Here's why:

In an enterprise AWS account, lots of other services are contributing to 
saturating the 10requests/sec throughput. In our (OfferUp, inc) prod AWS 
account, we are hitting that cap even before having Flink. Having more and more 
Flink jobs makes things worse, and breaks other services. Our Flink jobs are 
not more important than other services, so we can't either allocate resources 
solely for Flink or compete with other services. Thus we set Flink's discovery 
interval to be {{1 hour}}. 

It's probably not a topic worths a long discussion :) If you guys feel the 
default value is ok, I'll good with it

> Increase DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS for Flink-kinesis-connector
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-7223
>                 URL: https://issues.apache.org/jira/browse/FLINK-7223
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kinesis Connector
>    Affects Versions: 1.3.0
>            Reporter: Bowen Li
>            Assignee: Bowen Li
>            Priority: Minor
>             Fix For: 1.4.0
>
>
> Background: {{DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS}} in 
> {{org.apache.flink.streaming.connectors.kinesis.config.ConsumerConfigConstants}}
>  is the default value for Flink to call Kinesis's {{describeStream()}} API.
> Problem: Right now, its value is 10,000millis (10sec), which is too short. We 
> ran into problems that Flink-kinesis-connector's call of {{describeStream()}} 
> exceeds Kinesis rate limit, and broken Flink taskmanager.
> According to 
> http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html,
>  
> "This operation has a limit of 10 transactions per second per account.". What 
> it means is that the 10transaction/account is a limit on a single 
> organization's AWS account......:(  We contacted AWS Support, and confirmed 
> this. If you have more applications (either other Flink apps or non-Flink 
> apps) competing aggressively with your Flink app on this API, your Flink app 
> breaks. 
> I propose increasing the value DEFAULT_SHARD_DISCOVERY_INTERVAL_MILLIS from 
> 10,000millis(10sec) to preferably 300,000 (5min). Or at least 60,000 (1min) 
> if anyone has a solid reason arguing that 5min is too long, 
> This is also related to https://issues.apache.org/jira/browse/FLINK-6365



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to