nwangtw commented on issue #3198: Feature/create kafka spout
URL: https://github.com/apache/incubator-heron/pull/3198#issuecomment-474496360
 
 
   > > Interesting. Is topic -> stream 1 to 1? Why not have multiple spout 
components and each one is responsible for one topic?
   > 
   > Topic -> Stream is 1 -> 1 in our use case, but in Storm's implementation, 
it can be configured to be many -> 1 as well.
   > Running multiple spouts adds the overhead of running more Heron instances 
and Kafka consumers. We operate in a resource constrained environment, so that 
is not always feasible.
   
   Got it. Thanks. many -> 1 does sound useful in resource constrained env. It 
should be doable but there might be extra logic and config. I am thinking that 
maybe it can be created as a separated spout and share code with this 1:1 
version.
   
   > 
   > > very good question. I actually started with a "one-record-to-many-tuple" 
implementation, then I gave it a deep thought when I was implementing the 
`ATLEAST_ONCE` delivery guarantee. Allowing "one-record-to-many-tuple" will 
significantly complicate the algorithm to track acknowledgement, because then 
we have to keep tracking the mapping relationship between a single Kafka record 
offset to multiple message IDs.
   > > And then we also face a design choice whether the KafkaSpout itself 
should decide the uniqueness of a set of Message IDs coming from the same 
ConsumerRecord, or we should open the choice up to the developer?
   > > So, a neater choice is to use multiple KafkaSpout, each dedicated to an 
output stream.
   > > But, I do agree "one-record-to-many-tuple" is pretty useful and cost 
effective in terms of resource consumption. I have no obligation to put it back 
in, but then it becomes the developer's responsibility to make sure avoid 
emitting multiple tuples out of one ConsumerRecord ONLY in `ATLEAST_ONCE` mode, 
at least for this version of KafkaSpout before we introduce a more complicated 
ack/fail tracking mechanism.
   > 
   > Agreed that one record -> many tuples is really useful, but it does make 
tracking offsets a lot harder. However, I don't think that is possible using 
the current Kafka spout implementation from Storm - it only allows 1 record -> 
1 tuple emits, but to any declared stream.
   > Having this ability to choose the stream given a record would be a good 
start for the Heron Kafka spout, and it's much easier to implement.
   
   Are these streams the same or they are different (like apply different 
filters/transforms)? If they are the same, what is the difference between it 
and spout has one output and multiple bolts register to that stream? Different 
offsets?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to