[GitHub] [incubator-heron] rohanag12 commented on issue #3198: Feature/create kafka spout

GitBox Tue, 19 Mar 2019 10:21:40 -0700

rohanag12 commented on issue #3198: Feature/create kafka spout
URL: https://github.com/apache/incubator-heron/pull/3198#issuecomment-474478820
 
 
   > Interesting. Is topic -> stream 1 to 1? Why not have multiple spout 
components and each one is responsible for one topic?
   
   Topic -> Stream is 1 -> 1 in our use case, but in Storm's implementation, it 
can be configured to be many -> 1 as well.
   Running multiple spouts adds the overhead of running more Heron instances 
and Kafka consumers. We operate in a resource constrained environment, so that 
is not always feasible.
   
   > very good question. I actually started with a "one-record-to-many-tuple" 
implementation, then I gave it a deep thought when I was implementing the 
`ATLEAST_ONCE` delivery guarantee. Allowing "one-record-to-many-tuple" will 
significantly complicate the algorithm to track acknowledgement, because then 
we have to keep tracking the mapping relationship between a single Kafka record 
offset to multiple message IDs.
   > 
   > And then we also face a design choice whether the KafkaSpout itself should 
decide the uniqueness of a set of Message IDs coming from the same 
ConsumerRecord, or we should open the choice up to the developer?
   > 
   > So, a neater choice is to use multiple KafkaSpout, each dedicated to an 
output stream.
   > 
   > But, I do agree "one-record-to-many-tuple" is pretty useful and cost 
effective in terms of resource consumption. I have no obligation to put it back 
in, but then it becomes the developer's responsibility to make sure avoid 
emitting multiple tuples out of one ConsumerRecord ONLY in `ATLEAST_ONCE` mode, 
at least for this version of KafkaSpout before we introduce a more complicated 
ack/fail tracking mechanism.
   
   Agreed that one record -> many tuples is really useful, but it does make 
tracking offsets a lot harder. However, I don't think that is possible using 
the current Kafka spout implementation from Storm - it only allows 1 record -> 
1 tuple emits, but to any declared stream.
   Having this ability to choose the stream given a record would be a good 
start for the Heron Kafka spout, and it's much easier to implement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-heron] rohanag12 commented on issue #3198: Feature/create kafka spout

Reply via email to