Pavel Kuznetsov created KAFKA-10327:
---------------------------------------
Summary: Make flush after some count of putted records in SinkTask
Key: KAFKA-10327
URL: https://issues.apache.org/jira/browse/KAFKA-10327
Project: Kafka
Issue Type: Improvement
Components: KafkaConnect
Affects Versions: 2.5.0
Reporter: Pavel Kuznetsov
In current version of kafka connect all records accumulated with SinkTask.put
method are flushed to target system on a time-based manner. So data is flushed
and offsets are committed every offset.flush.timeout.ms (default is 60000) ms.
But you can't control the number of messages you receive from Kafka between two
flushes. It may cause out of memory errors, because in-memory buffer may grow a
lot.
I suggest to add out of box support of count-based flush to kafka connect. It
requires new configuration parameter (offset.flush.count, for example). Number
of records sent to SinkTask.put should be counted, and if these amount is
greater than offset.flush.count's value, SinkTask.flush is called and offsets
are committed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)