simingweng edited a comment on issue #3198: Feature/create kafka spout
URL: https://github.com/apache/incubator-heron/pull/3198#issuecomment-468958600
 
 
   @worlvlhole 
   Yes, I think you've got the most part. The KafkaSpout may be operated in 2 
different reliability mode, `ATMOST_ONCE` or `ATLEAST_ONCE`, (I haven't added 
`EFFECTIVE_ONCE` implementation yet, I'm working on it).
   
   In `ATMOST_ONCE` mode, the whole topology will not turn the `acking` 
mechanism on. so, the KafkaSpout can afford to emit the tuple without any 
message id, and it also immediately commit the currently-read offset back to 
Kafka broker, the neither `ack()` nor `fail()` callback will be invoked. 
Therefore, "in-flight" tuple will just get lost in case the KafkaSpout instance 
is blown up or the topology is restarted. That's what `ATMOST_ONCE` offers.
   
   In `ATLEAST_ONCE` mode, the `acking` mechanism is turned on topology-wise, 
so the KafkaSpout uses the `ack registry` to keep tracking all the 
**continuous** acknowledgement ranges for each partition, while the `failure 
registry` keeps tracking the **lowest** failed acknowledgement for each 
partition. When it comes to the time that the Kafka Consumer needs to poll the 
Kafka cluster for more records (because it's emitted everything it gets from 
the previous poll), then the KafkaSpout reconciles as following for each 
partition that it is consuming:
   
   1. if there's any failed tuple, seek back to the lowest corresponding offset
   2. discard all the acknowledgements that it's received but is greater than 
the lowest failed offset
   3. clear the lowest failed offset in `failure registry`
   4. commit the offset to be the upper boundary of the first range in the `ack 
registry`
   
   What is missing in this Kafka Spout implementation now is to handle the 
`EFFECTIVE_ONCE` scenario, which should completely rely on the `checkpointing` 
mechanism to decide how far it needs to rewind back. I'm working on it right 
now.
   
   I know this is quite some information, I'm writing README to explain things 
in more details, will keep updating the pull request.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to