m-ghazanfar opened a new issue, #15167: URL: https://github.com/apache/druid/issues/15167
### Description We use druid to store and query telemetry data. We ingest data into Druid via Kafka. Our ingestion rate is around 1.7M messages per sec. Our query load is about `1000` qps and most queries request for for data that's in the time range `[now-5, now-8]`. Since we have a high ingestion rate, we produce a lot of segments - about `2180` segments every hour. We have a segment granularity of `10` mins. And since our queries are for real-time data, Druid ends up querying all segments within a time chunk - about `360` segments. We have configured our Druid instance to hand off segments every `10` mins. With this, we see a query count pattern across indexers and historicals like this, <img width="1651" alt="Screenshot 2023-10-16 at 10 37 01 AM" src="https://github.com/apache/druid/assets/88474681/6d19fb00-a4e9-45ec-93b6-cd4c3d1db7d9"> Right after segment handoff our real-time queries go to both indexers and historicals. This is in line with the current behaviour of segment handoff - all data on the indexers gets handed-off when a handoff happens. I would like to have a way to retain, say 10mins(or 10M rows) of data on indexers at all the times - so that I can avoid the periodic increased load on the historicals. Adding @kfaraz because I discussed this with him in person. With some direction, I'd be happy to implement this myself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
