m-ghazanfar opened a new issue, #15149:
URL: https://github.com/apache/druid/issues/15149

   ### Description
   
   We use druid to store and query telemetry data. We ingest data into Druid 
via Kafka. Our ingestion rate is around 1.7M messages per sec.
   Our query load is about `1000` qps and most queries request for for data 
that's in the time range [now-5, now-8].
   
   Since we have a high ingestion rate, we produce a lot of segments. And since 
our queries are for real-time data, Druid ends up querying all segments within 
that time chunk.
   
   If you see the below graphs, you can see that `726` queries on the broker 
translate to about `66.25k` queries on the indexers. Which is a fanout of about 
`92`. We have `94` indexer nodes.
   
   <img width="1651" alt="Screenshot 2023-10-13 at 10 54 56 AM" 
src="https://github.com/apache/druid/assets/88474681/844573ff-01cb-4f33-b7c0-a8f02a86d03a";>
   <img width="1657" alt="Screenshot 2023-10-13 at 11 56 17 AM" 
src="https://github.com/apache/druid/assets/88474681/9590b56a-96bf-49ed-93cc-3786e1d09f2c";>
   
   Our data has a `tenant` dimension. The `tenant` dimension is always used to 
filter when performing a query.
   We want to perform secondary partitioning based on the `tenant` dimension - 
so that the broker can prune the segments which have to be queried.
   
   Data of one `tenant` is limited to a few kafka partitions(about 20). So, 
after having secondary partitioning, I would expect my fanout to be about 20, 
as opposed  to the 92 that I am seeing now.
   
   I know that this can be done via compaction - however, I can no make use of 
compaction because our queries are realtime
   
   
   ### Implementation
   I do not have an implementation in mind but do wish to contribute the 
implementation myself. 
   
   
   ### Related
   - https://github.com/apache/druid/issues/12929 : not the same as this 
because I don't want to add the kafka partition info
   - https://imply.io/blog/multi-dimensional-range-partitioning/
   - 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to