yashmayya opened a new pull request, #12800:
URL: https://github.com/apache/kafka/pull/12800

   - https://issues.apache.org/jira/browse/KAFKA-14342
   - 
[KafkaOffsetBackingStore](https://github.com/apache/kafka/blob/56d588d55ac313c0efca586a3bcd984c99a89018/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaOffsetBackingStore.java#L70)
 is used to track source connector offsets using a backing Kafka topic. It 
implements interface methods to get and set offsets using a 
[KafkaBasedLog](https://github.com/apache/kafka/blob/56d588d55ac313c0efca586a3bcd984c99a89018/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L80).
 It also maintains an in-memory map containing {partition, offset} entries for 
source connectors (which is populated via the consumer callback mechanism from 
the KafkaBasedLog).
   - When a tombstone offset (i.e. Kafka message with a null value) is 
encountered for a source partition, the map is simply updated to make the value 
null for the corresponding partition key. For certain source connectors which 
have a lot of source partitions that are "closed" frequently, this can be very 
problematic. Imagine a file source connector which reads data from all files in 
a directory line-by-line (and where file appends are not tracked) - each file 
corresponds to a source partition here, and the offset would be the line number 
in the file. If there are millions of files being read, this can bring down the 
Connect worker due to JVM heap exhaustion (OOM) caused by the in-memory map in 
KafkaOffsetBackingStore growing too large.
   - Even if the connector writes tombstone offsets for the last record in a 
source partition, this doesn't help completely since we don't currently remove 
entries from KafkaOffsetBackingStore's in-memory offset map (so the source 
partition keys will stick around in the map), even though we indicate here that 
tombstones can be used to "delete" offsets - 
https://github.com/apache/kafka/blob/56d588d55ac313c0efca586a3bcd984c99a89018/connect/runtime/src/main/java/org/apache/kafka/connect/storage/OffsetUtils.java#L37
   - This shouldn't have any other side effects because 
[Map::get](https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#get-java.lang.Object-)
 returns `null` if the map contains no mapping for a key.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to