[
https://issues.apache.org/jira/browse/SAMZA-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jae Hyeon Bae resolved SAMZA-541.
---------------------------------
Resolution: Won't Fix
> Passing coordinator to producer chains
> --------------------------------------
>
> Key: SAMZA-541
> URL: https://issues.apache.org/jira/browse/SAMZA-541
> Project: Samza
> Issue Type: Improvement
> Components: container
> Reporter: Jae Hyeon Bae
> Assignee: Jae Hyeon Bae
>
> StreamTask can control SamzaContainer.commit() through task coordinator but
> SystemProducer can call flush() without commit() which can create duplicate
> data on container failure. For example,
> T=30s; flush
> T=45s; flush
> T=60s; flush && commit
> T=65s; flush
> "If there's a failure before 60s, the messages that were flushed at 30s and
> 45s will be duplicated when the container reprocesses" (quoted from
> [~criccomini] response).
> If SystemProducer can call access TaskCoordinator created from RunLoop in
> SamzaContainer, it will be flexible to control 'exactly-once-delivery'.
> The following interface should be changed:
> {code}
> MessageCollector.send(envelop, coordinator)
> SystemProducers.send(source, envelop, coordinator)
> SystemProducer.send(source, envelop, coordinator)
> {code}
> Kafka 0.8 new Java producer cannot be synchronized with TaskInstance.commit
> because it doesn't do batch-flush. Depending on SystemProducer
> implementation, it can guarantee 'exactly-once-delivery'.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)