m-ghazanfar opened a new issue, #15149: URL: https://github.com/apache/druid/issues/15149
### Description We use druid to store and query telemetry data. We ingest data into Druid via Kafka. Our ingestion rate is around 1.7M messages per sec. Our query load is about `1000` qps and most queries request for for data that's in the time range [now-5, now-8]. Since we have a high ingestion rate, we produce a lot of segments. And since our queries are for real-time data, Druid ends up querying all segments within that time chunk. If you see the below graphs, you can see that `726` queries on the broker translate to about `66.25k` queries on the indexers. Which is a fanout of about `92`. We have `94` indexer nodes. <img width="1651" alt="Screenshot 2023-10-13 at 10 54 56 AM" src="https://github.com/apache/druid/assets/88474681/844573ff-01cb-4f33-b7c0-a8f02a86d03a"> <img width="1657" alt="Screenshot 2023-10-13 at 11 56 17 AM" src="https://github.com/apache/druid/assets/88474681/9590b56a-96bf-49ed-93cc-3786e1d09f2c"> Our data has a `tenant` dimension. The `tenant` dimension is always used to filter when performing a query. We want to perform secondary partitioning based on the `tenant` dimension - so that the broker can prune the segments which have to be queried. Data of one `tenant` is limited to a few kafka partitions(about 20). So, after having secondary partitioning, I would expect my fanout to be about 20, as opposed to the 92 that I am seeing now. I know that this can be done via compaction - however, I can no make use of compaction because our queries are realtime ### Implementation I do not have an implementation in mind but do wish to contribute the implementation myself. ### Related - https://github.com/apache/druid/issues/12929 : not the same as this because I don't want to add the kafka partition info - https://imply.io/blog/multi-dimensional-range-partitioning/ - -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
