Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/1919
@srdo @XuMingmin
If my understanding is right, at-most-once can be guaranteed with this
step:
1. pull the data from datasource
2. send ack to the datasource
3. emit the data to the downstreams
Loosening the requirement that there will be no crash between emitting the
data and sending ack to the datasource, we can swap 2 and 3, and that's what
we're often referring to.
So yes case 3 should explicitly ack to the datasource and data should be
emitted only when sending ack succeeds. I'm not familiar with Kafka new API,
but if `KafkaConsumer.commitSync` guarantees ack, we should use this for case 3.
Please correct me if I'm missing here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---