nwangtw commented on issue #3198: Feature/create kafka spout
URL: https://github.com/apache/incubator-heron/pull/3198#issuecomment-471887499
 
 
   > @worlvlhole
   > Yes, I think you've got the most part. The KafkaSpout may be operated in 2 
different reliability mode, `ATMOST_ONCE` or `ATLEAST_ONCE`, (I haven't added 
`EFFECTIVE_ONCE` implementation yet, I'm working on it).
   > 
   > ### `ATMOST_ONCE` mode
   > the whole topology will not turn the `acking` mechanism on. so, the 
KafkaSpout can afford to emit the tuple without any message id, and it also 
immediately commit the currently-read offset back to Kafka broker, and neither 
`ack()` nor `fail()` callback will be invoked. Therefore, "in-flight" tuple 
will just get lost in case the KafkaSpout instance is blown up or the topology 
is restarted. That's what `ATMOST_ONCE` offers.
   > 
   > ### `ATLEAST_ONCE` mode
   > the `acking` mechanism is turned on topology-wise, so the KafkaSpout uses 
the `ack registry` to keep tracking all the **continuous** acknowledgement 
ranges for each partition, while the `failure registry` keeps tracking the 
**lowest** failed acknowledgement for each partition. When it comes to the time 
that the Kafka Consumer needs to poll the Kafka cluster for more records 
(because it's emitted everything it got from the previous poll), then the 
KafkaSpout reconciles as following for each partition that it is consuming:
   > 
   > 1. if there's any failed tuple, seek back to the lowest corresponding 
offset
   > 2. discard all the acknowledgements that it's received but is greater than 
the lowest failed offset
   > 3. clear the lowest failed offset in `failure registry`
   > 4. commit the offset to be the upper boundary of the first range in the 
`ack registry`
   > 
   > So, it guarantees each tuple emitted by the KafkaSpout must be 
successfully processed across the whole topology at least once.
   > 
   > ### Not Implemented
   > What is missing in this Kafka Spout implementation now is to handle the 
`EFFECTIVE_ONCE` scenario, which should completely rely on the `checkpointing` 
mechanism to decide how far it needs to rewind back. I'm working on it right 
now.
   > 
   > I know this is quite some information, I'm writing README to explain 
things in more details, will keep updating the pull request.
   
   This information might be included in the document?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to