[I] Support secondary partitioning when ingesting from Kafka (druid)

via GitHub Thu, 12 Oct 2023 23:29:56 -0700


m-ghazanfar opened a new issue, #15149:
URL: https://github.com/apache/druid/issues/15149

### Description

We use druid to store and query telemetry data. We ingest data into Druid
via Kafka. Our ingestion rate is around 1.7M messages per sec.
Our query load is about `1000` qps and most queries request for for data
that's in the time range [now-5, now-8].

Since we have a high ingestion rate, we produce a lot of segments. And since
our queries are for real-time data, Druid ends up querying all segments within
that time chunk.

If you see the below graphs, you can see that `726` queries on the broker
translate to about `66.25k` queries on the indexers. Which is a fanout of about
`92`. We have `94` indexer nodes.

Our data has a `tenant` dimension. The `tenant` dimension is always used to
filter when performing a query.
We want to perform secondary partitioning based on the `tenant` dimension -
so that the broker can prune the segments which have to be queried.

Data of one `tenant` is limited to a few kafka partitions(about 20). So,
after having secondary partitioning, I would expect my fanout to be about 20,
as opposed to the 92 that I am seeing now.

I know that this can be done via compaction - however, I can no make use of
compaction because our queries are realtime

### Implementation
I do not have an implementation in mind but do wish to contribute the
implementation myself.

### Related
- https://github.com/apache/druid/issues/12929 : not the same as this
because I don't want to add the kafka partition info
- https://imply.io/blog/multi-dimensional-range-partitioning/
-

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Support secondary partitioning when ingesting from Kafka (druid)

Reply via email to