gianm commented on issue #11414:
URL: https://github.com/apache/druid/issues/11414#issuecomment-2835791554

   I think there are actually two root causes of ingestion lag:
   
   1. Bottlenecks in the Overlord around operations like leader failover, task 
discovery, segment allocations. If the Overlord cannot execute these operations 
quickly then ingestion tasks will at some point get stalled. (For example: if a 
task gets data for a new time period, it cannot proceed until a segment is 
allocated.)
   2. When the Overlord rolls tasks, it first tells the old ones to stop 
ingesting, then it launches new ones. This means that there is always a stall 
of roughly task-launch-time whenever a roll happens.
   
   A lot of work has been done on area (1), recently led by @kfaraz. In an 
earlier comment I listed some enhancements that were made in Druid 25. Since 
then, there has been more work, a few highlights including:
   
   - Batch segment allocation enabled by default: 
https://github.com/apache/druid/pull/13942
   - Reduced metadata store usage during segment allocation: 
https://github.com/apache/druid/pull/17496
   - Segment metadata cache on the Overlord (ships in Druid 33 but not yet on 
by default): https://github.com/apache/druid/pull/17653
   - Improve concurrency in TaskQueue (hasn't shipped yet): 
https://github.com/apache/druid/pull/17828
   
   There are other PRs too, these are just the first few that come to mind.
   
   As far as I know, there hasn't been work done on (2). In most of our 
production environments this is a source of small but noticeable lag. When we 
use the k8s launcher, it's generally 10–15s of lag each rollover period. The 
main reason that tasks are rolled like this is that the new tasks have their 
start offsets included in the task spec, so they can't be created until the old 
tasks stop reading. This could be fixed by doing something like:
   
   - Remove start offsets from the task spec
   - Add a `setStartOffsets` API call that the Overlord should call when it 
knows the start offsets
   - The Overlord should launch the new tasks early. To make this possible it 
needs to have a little warning of when the new tasks will be needed (maybe 1 
minute). For `taskDuration` based rollover this is easy. For 
`maxRowsPerSegment` or `maxTotalRows` based rollover, the supervisor doesn't 
have visibility into the numbers, so it will need to retrieve them from tasks 
periodically in order to get proper warning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to