[
https://issues.apache.org/jira/browse/KAFKA-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952954#comment-16952954
]
Ryanne Dolan commented on KAFKA-6080:
-------------------------------------
My goal is to eliminate duplicates downstream of a SourceConnector. The records
returned by poll() should be stored in Kafka exactly as they come -- in the
same order, exactly once, no additional dupes introduced by the worker. This is
mostly straightforward.
Here's my working hypothesis:
- Workers get producer IDs from their Herder, based on the task ID.
- Workers send, flush, commit transactionally.
- When a Task fails, the Worker aborts the transaction.
- When a Worker fails, any outstanding transactions time out
No API changes are required for this, afaict. However, I may end up completely
reimplementing WorkerSourceTask to get this right.
I'm happy to collaborate or yield if you guys want to pick this up: otherwise
I'll put a KIP together very soon.
> Transactional EoS for source connectors
> ---------------------------------------
>
> Key: KAFKA-6080
> URL: https://issues.apache.org/jira/browse/KAFKA-6080
> Project: Kafka
> Issue Type: New Feature
> Components: KafkaConnect
> Reporter: Antony Stubbs
> Assignee: Ryanne Dolan
> Priority: Major
> Labels: needs-kip
>
> Exactly once (eos) message production for source connectors.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)