simingweng commented on issue #3198: Feature/create kafka spout
URL: https://github.com/apache/incubator-heron/pull/3198#issuecomment-474463341
 
 
   > Hi, thanks for this PR, happy to finally see a Kafka Spout implementation 
in Heron. I am planning on using this once it is merged, but I have a question. 
One major difference between this implementation and the Storm one is that 
Storm's spout allows emitting to different streams, using the 
`org.apache.storm.kafka.spout.RecordTranslator` interface. This implementation 
is missing this particular functionality, which is quite useful.
   > 
   > Is there a reason for not keeping this functionality in this Kafka Spout 
implementation? Or is there another way to achieve similar functionality (other 
than creating a map-function like bolt for this purpose)? It's really useful 
for sending data from different topics to different downstream bolts.
   
   very good question. I actually started with a "one-record-to-many-tuple" 
implementation, then I gave it a deep thought when I was implementing the 
`ATLEAST_ONCE` delivery guarantee. Allowing "one-record-to-many-tuple" will 
significantly complicate the algorithm to track acknowledgement, because then 
we have to keep tracking the mapping relationship between a single Kafka record 
offset to multiple message IDs.
   
   And then we also face a design choice whether the KafkaSpout itself should 
decide the uniqueness of a set of Message IDs coming from the same 
ConsumerRecord, or we should open the choice up to the developer?
   
   So, a neater choice is to use multiple KafkaSpout, each dedicated to an 
output stream.
   
   But, I do agree "one-record-to-many-tuple" is pretty useful and cost 
effective in terms of resource consumption. I have no obligation to put it back 
in, but then it becomes the developer's responsibility to make sure avoid 
emitting multiple tuples out of one ConsumerRecord ONLY in `ATLEAST_ONCE` mode.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to