[ 
https://issues.apache.org/jira/browse/KAFKA-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952954#comment-16952954
 ] 

Ryanne Dolan commented on KAFKA-6080:
-------------------------------------

My goal is to eliminate duplicates downstream of a SourceConnector. The records 
returned by poll() should be stored in Kafka exactly as they come -- in the 
same order, exactly once, no additional dupes introduced by the worker. This is 
mostly straightforward.

Here's my working hypothesis:
- Workers get producer IDs from their Herder, based on the task ID.
- Workers send, flush, commit transactionally.
- When a Task fails, the Worker aborts the transaction.
- When a Worker fails, any outstanding transactions time out

No API changes are required for this, afaict. However, I may end up completely 
reimplementing WorkerSourceTask to get this right.

I'm happy to collaborate or yield if you guys want to pick this up: otherwise 
I'll put a KIP together very soon.

> Transactional EoS for source connectors
> ---------------------------------------
>
>                 Key: KAFKA-6080
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6080
>             Project: Kafka
>          Issue Type: New Feature
>          Components: KafkaConnect
>            Reporter: Antony Stubbs
>            Assignee: Ryanne Dolan
>            Priority: Major
>              Labels: needs-kip
>
> Exactly once (eos) message production for source connectors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to