Laboltus opened a new issue, #18707:
URL: https://github.com/apache/druid/issues/18707
### Motivation
Recently we faced with timeouts from MySQL metadata storage. After
investigation I found out that DRUID queries are not optimized for large amount
of segments. We have 12M segments in druid_segments table and 5M of them are
used. Most queries from Overlord/Coordinator use index by `used` or by
`datasource` and scans hundreds of thousands or even millions rows every time.
Considering size of the table (80G+ in our case) it leads to i/o overload.
### Proposed changes
I found 3 sources of the load:
1. In DRUID 28+ we have the new used_status_last_updated column and some
background migration process in Coordinator which updates the initial NULL
values in this column.
```
FROM druid_segments
WHERE used_status_last_updated IS NULL
AND used = 0
LIMIT 100;
```
it uses index by `used`, so it tried to scan 7M rows, got timeout every time
and started over.
I replaced index by `used` with composite index
```
CREATE INDEX idx_used_used_status ON druid_segments (used,
used_status_last_updated);
ALTER TABLE druid_segments DROP INDEX idx_used;
```
and the load decreased significantly.
2. Overlords issue queried like this
```
SELECT payload
FROM druid_segments
WHERE used = 1
AND dataSource = 'DATASOURCE'
AND (
(start < '2025-10-29T14:00:00.000Z' AND `end` >
'2025-10-29T13:00:00.000Z')
OR (start = '-146136543-09-08T08:23:32.096Z' AND `end` !=
'146140482-04-24T15:36:27.903Z' AND `end` > '2025-10-29T13:00:00.000Z')
OR (start != '-146136543-09-08T08:23:32.096Z' AND `end` =
'146140482-04-24T15:36:27.903Z' AND start < '2025-10-29T14:00:00.000Z')
OR (start = '-146136543-09-08T08:23:32.096Z' AND `end` =
'146140482-04-24T15:36:27.903Z')
);
```
according to explain they use `used` or `dataSource` indexes so they scans
large amount of rows just to find segments inside one hour. I added 2 indexes
```
CREATE INDEX idx_seg_ds_used_start ON druid_segments (dataSource, used,
start);
CREATE INDEX idx_seg_ds_used_end ON druid_segments (dataSource, used,
`end`);
```
and our i/o load disappeared.
### Operational impact
Just to mention, you should not add indexes on large tables directly. I used
pt-online-schema-change with load limits.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]