[ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=328844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328844
 ]

ASF GitHub Bot logged work on BEAM-8382:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Oct/19 23:24
            Start Date: 15/Oct/19 23:24
    Worklog Time Spent: 10m 
      Work Description: jfarr commented on issue #9765: [BEAM-8382] Add polling 
interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-542445144
 
 
   @aromanenko-dev If we're the only consumer on the stream we can call 
getRecords every 200ms without getting throttled (in theory). If we introduce a 
1 sec delay on the first KMSThrottlingException then we have an unnecessary 
800ms of latency. Let's say we reduce the initial delay to 100ms. Assuming a 
successful call takes about 10ms, we'll hit an exception after 50ms and for the 
next 950ms thereafter. On the 4th retry we'll succeed but we'll be at 800ms. 
Reduce it to 10ms and on the 7th retry we'll succeed but we'll be at 640ms. In 
any case we've introduced another knob (initial backoff delay time).
   
   So that's assuming delay time starts at zero and only increases when we get 
a KMSThrottlingException. Since that's 100% guaranteed to happen we could try 
with a non-zero initial delay instead. Maybe if we start at 200ms we won't get 
throttled at all and we won't overshoot. Another knob.
   
   OK so now we've overshot. Maybe we can ease back on the delay. We can 
speculatively try a shorter delay time and see if we still get throttled. How 
often to try? How much to ease back? More knobs.
   
   You can fiddle with these knobs and come up with something that works well 
when you're the only consumer, but as soon as you have 2 or more consumers you 
have to throw that out the window. If you're pulling back more or less records 
that may take more or less time and throw off your timing. You can't really 
hardcode anything because what works well in one scenario may not work well in 
another.
   
   So those are my initial thoughts. I'm totally open to the idea that I'm just 
overthinking it. I think the ultimate test would be to try it out and see what 
works. But if you want to minimize latency (and we do) I think the 1 second 
rolling window introduces a feedback delay that makes this a little more 
complicated than first glance. What do you think?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 328844)
    Time Spent: 1h  (was: 50m)

> Add polling interval to KinesisIO.Read
> --------------------------------------
>
>                 Key: BEAM-8382
>                 URL: https://issues.apache.org/jira/browse/BEAM-8382
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-kinesis
>    Affects Versions: 2.13.0, 2.14.0, 2.15.0
>            Reporter: Jonothan Farr
>            Assignee: Jonothan Farr
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to