rohanag12 commented on issue #3198: Feature/create kafka spout URL: https://github.com/apache/incubator-heron/pull/3198#issuecomment-474478820 > Interesting. Is topic -> stream 1 to 1? Why not have multiple spout components and each one is responsible for one topic? Topic -> Stream is 1 -> 1 in our use case, but in Storm's implementation, it can be configured to be many -> 1 as well. Running multiple spouts adds the overhead of running more Heron instances and Kafka consumers. We operate in a resource constrained environment, so that is not always feasible. > very good question. I actually started with a "one-record-to-many-tuple" implementation, then I gave it a deep thought when I was implementing the `ATLEAST_ONCE` delivery guarantee. Allowing "one-record-to-many-tuple" will significantly complicate the algorithm to track acknowledgement, because then we have to keep tracking the mapping relationship between a single Kafka record offset to multiple message IDs. > > And then we also face a design choice whether the KafkaSpout itself should decide the uniqueness of a set of Message IDs coming from the same ConsumerRecord, or we should open the choice up to the developer? > > So, a neater choice is to use multiple KafkaSpout, each dedicated to an output stream. > > But, I do agree "one-record-to-many-tuple" is pretty useful and cost effective in terms of resource consumption. I have no obligation to put it back in, but then it becomes the developer's responsibility to make sure avoid emitting multiple tuples out of one ConsumerRecord ONLY in `ATLEAST_ONCE` mode, at least for this version of KafkaSpout before we introduce a more complicated ack/fail tracking mechanism. Agreed that one record -> many tuples is really useful, but it does make tracking offsets a lot harder. However, I don't think that is possible using the current Kafka spout implementation from Storm - it only allows 1 record -> 1 tuple emits, but to any declared stream. Having this ability to choose the stream given a record would be a good start for the Heron Kafka spout, and it's much easier to implement.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
