[ 
https://issues.apache.org/jira/browse/KAFKA-13601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508566#comment-17508566
 ] 

Anil Dasari edited comment on KAFKA-13601 at 3/18/22, 4:52 AM:
---------------------------------------------------------------

Filed partitioner uses starting offset of the batch in the file name and is the 
only varying parameter. There will be only one parquet file per worker 
(consumer/partition) (that is out of sync with offsets) present in destination 
(s3 in my case) if worker dies before committing an offset. So new or restarted 
worker would override that parquet file as start offset of the partition remans 
same. 

Please let me know if you have any questions. 


was (Author: JIRAUSER283879):
File name in the filed partitioner uses starting offset of the batch and there 
will be only one parquet file per worker (consumer/partition) present in 
destination (s3 in my case) if worker dies before committing an offset. So new 
or restarted worker would override that parquet file as start offset of the 
partition remans same. 

Please let me know if you have any questions. 

> Add option to support sync offset commit in Kafka Connect Sink
> --------------------------------------------------------------
>
>                 Key: KAFKA-13601
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13601
>             Project: Kafka
>          Issue Type: New Feature
>          Components: KafkaConnect
>            Reporter: Anil Dasari
>            Priority: Major
>
> Exactly once in s3 connector with scheduled rotation and field partitioner 
> can be achieved with consumer offset sync' commit after message batch flushed 
> to sink successfully
> Currently, WorkerSinkTask committing the consumer offsets asynchronously and 
> at regular intervals of WorkerConfig.OFFSET_COMMIT_INTERVAL_MS_CONFIG 
> [https://github.com/apache/kafka/blob/371f14c3c12d2e341ac96bd52393b43a10acfa84/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L203]
> [https://github.com/apache/kafka/blob/371f14c3c12d2e341ac96bd52393b43a10acfa84/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L196]
> [https://github.com/apache/kafka/blob/371f14c3c12d2e341ac96bd52393b43a10acfa84/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L354]
>  
> Add config to allow user to select synchronous commit over 
> WorkerConfig.OFFSET_COMMIT_INTERVAL_MS_CONFIG  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to