sajjad-moradi commented on pull request #6291:
URL: https://github.com/apache/incubator-pinot/pull/6291#issuecomment-733995748


   > Can you note in your checkin comments that what we are throttling is not 
really the consumption, but the _processing_ of messages. We will still consume 
as much as we can from the stream.
   > 
   > If we were to limit consumption, then the behavior will be somewhat like:
   > 
   > ```
   > while (true) {
   >   consumeMsgsAsPerSomeAllowedRate()
   >   processAllMsgsConsumed()
   >   sleepAsIndicatedByRateLimiter()
   > }
   > ```
   > 
   > Whereas by rate limiting the processing, we are doing the following:
   > 
   > ```
   > while (true) {
   >   consumeAllMsgsThatWeCan()
   >   foreach(msg) {
   >     processMsg()
   >     sleepAsDictatedByRateLimiter()
   >   }
   > }
   > ```
   > 
   > So, we may be taking up more heap in the second case ?
   
   I actually looked into that and Kafka doesn't provide an API to retrieve 
limited number of messages. IMO having a rate limit on processing will have 
similar effect as if we put the rate limit on the consumption because we 
synchronously process the messages after the messages are polled from Kafka. 
For example, let's assume for a bursty period, the incoming rate of Kafka 
messages is 100 msgs/sec and we have set the rate limit to 20 msgs/sec. That 
means for a period of 10 seconds, we only process 200 messages and while we're 
processing messages, we don't consume new messages. This effectively puts the 
consumption rate to 20msgs/sec while if there was no rate limit we would've 
consumed 1000 messages at rate 100 msgs/sec.
   
   Side note:
   Kafka consumer has this configuration `max.partition.fetch.byte` that limits 
the count of consumed bytes. That is a bit hard to utilize as the intention 
here is to consume less number of messages than consumed bytes.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to