[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

ASF GitHub Bot (Jira) Fri, 18 Oct 2019 08:40:41 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=330576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330576
 ]


ASF GitHub Bot logged work on BEAM-8382:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Oct/19 15:39
            Start Date: 18/Oct/19 15:39
    Worklog Time Spent: 10m 
      Work Description: aromanenko-dev commented on issue #9765: [BEAM-8382] 
Add polling interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-543801925
 
 
   @jfarr Regarding backoff strategy  - I think it's a good way in case if your 
backend doesn't provide any statistics in advance at which rate you have to 
perform next queries. We know only maximum limits and the result of last 
`kinesis.getRecords()` execution. So, starting with recommended default value 
and use proper backoff strategy in case of failures it could be optimal 
solution when you need to limit rate of requests.  Otherwise, there are big 
chances that backend resources won't be utilised efficiently. `FluentBackoff` 
is used in many Beam IOs for similar reasons.
   
   I'd like to ask you questions about your use case. Let's suppose we can have 
ability set static polling interval with `withPollingInterval(Duration)` (as 
implemented in current PR). What would be default value in your case? 1 sec? 
What would you do in case if it won't be enough for already running pipeline? 
Would your need to change this value and restart pipeline manually?
   
   Btw, default delay of 1 sec looks quite pessimistic since according to AWS 
doc `each shard can support up to five read transactions per second.` So, in 
case of one consumer per shard (I can guess this is a quite common case) it 
should be 200 ms as default value. 
   
   PS: The goal of minimizing the number of tuning knobs is to reduce the 
number of possible config combinations (which grows exponentially) in case if 
it's possible to configure this automatically [1]. If it's not possible then 
the better solution will be to provide flexible API, especially for unbounded 
sources (as Kinesis, Kafka, etc), that are being used in long-running 
pipelines.  
   
   [1] https://beam.apache.org/contribute/ptransform-style-guide/#configuration
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 330576)
    Time Spent: 2h  (was: 1h 50m)

> Add polling interval to KinesisIO.Read
> --------------------------------------
>
>                 Key: BEAM-8382
>                 URL: https://issues.apache.org/jira/browse/BEAM-8382
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-kinesis
>    Affects Versions: 2.13.0, 2.14.0, 2.15.0
>            Reporter: Jonothan Farr
>            Assignee: Jonothan Farr
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

Reply via email to