[ 
https://issues.apache.org/jira/browse/KAFKA-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajit Singh updated KAFKA-17424:
-------------------------------
    Description: 
When Kafka connect gives sink task it's own copy of List<SinkRecords> that RAM 
utilisation shoots up and at that particular moment the there will be two lists 
and the original list gets cleared after the sink worker finishes the current 
batch.

 

Originally the list is declared final and it's copy is provided to sink task as 
those can be custom and we let user process it however they want without any 
risk. But one of the most popular uses of kafka connect is OLTP - OLAP 
replication, and during initial copying/snapshots a lot of data is generated 
rapidly which fills the list to it's max batch size length, and we are prone to 
"Out of Memory" exceptions. And the only use of the list is to get filled > 
cloned for sink > get size  > cleared > repeat. So I have taken the size of 
list before giving the original list to sink task and after sink has performed 
it's operations , set list = new ArrayList<>(). I did not use clear for just in 
case sink task has set our list to null.

There is a time vs memory trade-off, 
In the original approach the jvm does not have spend time to find free memory 

In new approach the jvm will have to create new list by finding free memory 
addresses but this results in more free memory.

  was:When Kafka connect gives sink task it's own copy of List<SinkRecords> 
that RAM utilisation shoots up and at that particular moment the there will be 
two lists and the original list gets cleared after the sink worker finishes the 
current batch.


> Memory optimisation for Kafka-connect
> -------------------------------------
>
>                 Key: KAFKA-17424
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17424
>             Project: Kafka
>          Issue Type: Improvement
>          Components: connect
>    Affects Versions: 3.8.0
>            Reporter: Ajit Singh
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When Kafka connect gives sink task it's own copy of List<SinkRecords> that 
> RAM utilisation shoots up and at that particular moment the there will be two 
> lists and the original list gets cleared after the sink worker finishes the 
> current batch.
>  
> Originally the list is declared final and it's copy is provided to sink task 
> as those can be custom and we let user process it however they want without 
> any risk. But one of the most popular uses of kafka connect is OLTP - OLAP 
> replication, and during initial copying/snapshots a lot of data is generated 
> rapidly which fills the list to it's max batch size length, and we are prone 
> to "Out of Memory" exceptions. And the only use of the list is to get filled 
> > cloned for sink > get size  > cleared > repeat. So I have taken the size of 
> list before giving the original list to sink task and after sink has 
> performed it's operations , set list = new ArrayList<>(). I did not use clear 
> for just in case sink task has set our list to null.
> There is a time vs memory trade-off, 
> In the original approach the jvm does not have spend time to find free memory 
> In new approach the jvm will have to create new list by finding free memory 
> addresses but this results in more free memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to