Re: Topic regex for kafka-indexing-service

Gian Merlino Tue, 19 Feb 2019 11:25:00 -0800

Hey Prabhat,

We wrote up a blog post a couple years back discussing the design:
https://imply.io/post/exactly-once-streaming-ingestion. A few of the key
PRs are:

- https://github.com/apache/incubator-druid/pull/2220 (original PR adding
the KafkaIndexTask)
- https://github.com/apache/incubator-druid/pull/2656 (original PR adding
the KafkaSupervisor, completing the feature)
- https://github.com/apache/incubator-druid/pull/4815 (PR updating both to
support incremental handoffs, a major design change)

As to complexities involved in reading from multiple topics into a single
datasource, the main area to look at would be KafkaDataSourceMetadata /
SeekableStreamDataSourceMetadata and all the things that track metadata
(look for usages of those classes). Most of them assume that each
datasource is reading from only a single topic. We wouldn't need to give up
any features or guarantees -- we'd just need to modify things from 1-1 to
1-many.

Gian

On Tue, Feb 19, 2019 at 6:54 AM Prabhat Gupta <prabha...@media.net> wrote:

> Hey all,
> Just a quick question, can someone point me to the mail thread and design
> doc for kafka-indexing-service? I wanted to understand what complexities it
> presents, while adding support for reading from multiple topics in a single
> datasource/supervisor and what goals were in mind while choosing this
> design. Since our use case absolutely requires this, I was wondering if we
> could change the code to achieve this very thing by may be giving up some
> features/guarantees. I can't find any alive discussion on this very topic,
> so i am not sure if this is something being considered in future releases.
>
> Thank you very much
>
> --
> Prabhat Kumar Gupta
> Sr. Tech Lead, Data Eng.
> Media.net
> Ph.-9987776847
>

Re: Topic regex for kafka-indexing-service

Reply via email to