Hey Prabhat, We wrote up a blog post a couple years back discussing the design: https://imply.io/post/exactly-once-streaming-ingestion. A few of the key PRs are:
- https://github.com/apache/incubator-druid/pull/2220 (original PR adding the KafkaIndexTask) - https://github.com/apache/incubator-druid/pull/2656 (original PR adding the KafkaSupervisor, completing the feature) - https://github.com/apache/incubator-druid/pull/4815 (PR updating both to support incremental handoffs, a major design change) As to complexities involved in reading from multiple topics into a single datasource, the main area to look at would be KafkaDataSourceMetadata / SeekableStreamDataSourceMetadata and all the things that track metadata (look for usages of those classes). Most of them assume that each datasource is reading from only a single topic. We wouldn't need to give up any features or guarantees -- we'd just need to modify things from 1-1 to 1-many. Gian On Tue, Feb 19, 2019 at 6:54 AM Prabhat Gupta <prabha...@media.net> wrote: > Hey all, > Just a quick question, can someone point me to the mail thread and design > doc for kafka-indexing-service? I wanted to understand what complexities it > presents, while adding support for reading from multiple topics in a single > datasource/supervisor and what goals were in mind while choosing this > design. Since our use case absolutely requires this, I was wondering if we > could change the code to achieve this very thing by may be giving up some > features/guarantees. I can't find any alive discussion on this very topic, > so i am not sure if this is something being considered in future releases. > > Thank you very much > > -- > Prabhat Kumar Gupta > Sr. Tech Lead, Data Eng. > Media.net > Ph.-9987776847 >