Alxander64 opened a new issue #9267:
URL: https://github.com/apache/pulsar/issues/9267


   **Is your enhancement request related to a problem? Please describe.**
   
   I have some JDBC sinks that I recently reconfigured to have a longer timeout 
and batch size. This helped since the topic had a steady publish rate of about 
100 msg/s, so these settings prevent spamming inserts to the database.
   
   In some scenarios, I'd want to start these sinks on data from a day or two 
ago, and let it catch up to live data. Even with a batch size of 100,000, and 
timeout of 1 minute, the sink consumes messages very fast from the topic. It 
would receive 100,000 messages in much less than 1 minute, and it would still 
spam inserts to the database. I don't want to increase the batch size too 
large, because I suspect there will still be issues with waiting on the 
database to finish inserting, and more large batches being ready to go so soon.
   
   **Describe the solution you'd like**
   
   An option for sink connectors, similar to `--rate` of the `pulsar-client` 
CLI. This would let the user specify a throughput rate for the sink consumer. 
This can be used to set the consumption rate only slightly higher than the 
expected live data rate, so the sink can still catch up but not spam inserts as 
often.
   
   I suppose this works best when there is a steady, expected level of 
throughput on the topic. I'm not sure if this behaviour can be made more 
dynamic in any way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to