gianm commented on issue #11414: URL: https://github.com/apache/druid/issues/11414#issuecomment-2835791554
I think there are actually two root causes of ingestion lag: 1. Bottlenecks in the Overlord around operations like leader failover, task discovery, segment allocations. If the Overlord cannot execute these operations quickly then ingestion tasks will at some point get stalled. (For example: if a task gets data for a new time period, it cannot proceed until a segment is allocated.) 2. When the Overlord rolls tasks, it first tells the old ones to stop ingesting, then it launches new ones. This means that there is always a stall of roughly task-launch-time whenever a roll happens. A lot of work has been done on area (1), recently led by @kfaraz. In an earlier comment I listed some enhancements that were made in Druid 25. Since then, there has been more work, a few highlights including: - Batch segment allocation enabled by default: https://github.com/apache/druid/pull/13942 - Reduced metadata store usage during segment allocation: https://github.com/apache/druid/pull/17496 - Segment metadata cache on the Overlord (ships in Druid 33 but not yet on by default): https://github.com/apache/druid/pull/17653 - Improve concurrency in TaskQueue (hasn't shipped yet): https://github.com/apache/druid/pull/17828 There are other PRs too, these are just the first few that come to mind. As far as I know, there hasn't been work done on (2). In most of our production environments this is a source of small but noticeable lag. When we use the k8s launcher, it's generally 10–15s of lag each rollover period. The main reason that tasks are rolled like this is that the new tasks have their start offsets included in the task spec, so they can't be created until the old tasks stop reading. This could be fixed by doing something like: - Remove start offsets from the task spec - Add a `setStartOffsets` API call that the Overlord should call when it knows the start offsets - The Overlord should launch the new tasks early. To make this possible it needs to have a little warning of when the new tasks will be needed (maybe 1 minute). For `taskDuration` based rollover this is easy. For `maxRowsPerSegment` or `maxTotalRows` based rollover, the supervisor doesn't have visibility into the numbers, so it will need to retrieve them from tasks periodically in order to get proper warning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
