[I] Allow to retain X amount of data on indexers at handoff (druid)

via GitHub Sun, 15 Oct 2023 23:09:42 -0700


m-ghazanfar opened a new issue, #15167:
URL: https://github.com/apache/druid/issues/15167

### Description
We use druid to store and query telemetry data. We ingest data into Druid
via Kafka. Our ingestion rate is around 1.7M messages per sec.
Our query load is about `1000` qps and most queries request for for data
that's in the time range `[now-5, now-8]`.

Since we have a high ingestion rate, we produce a lot of segments - about
`2180` segments every hour. We have a segment granularity of `10` mins.
And since our queries are for real-time data, Druid ends up querying all
segments within a time chunk - about `360` segments.

We have configured our Druid instance to hand off segments every `10` mins.

With this, we see a query count pattern across indexers and historicals like
this,
<img width="1651" alt="Screenshot 2023-10-16 at 10 37 01 AM"
src="https://github.com/apache/druid/assets/88474681/6d19fb00-a4e9-45ec-93b6-cd4c3d1db7d9";>

Right after segment handoff our real-time queries go to both indexers and
historicals. This is in line with the current behaviour of segment handoff -
all data on the indexers gets handed-off when a handoff happens.

I would like to have a way to retain, say 10mins(or 10M rows) of data on
indexers at all the times - so that I can avoid the periodic increased load on
the historicals.

Adding @kfaraz because I discussed this with him in person.

With some direction, I'd be happy to implement this myself.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Allow to retain X amount of data on indexers at handoff (druid)

Reply via email to