gianm commented on code in PR #13503: URL: https://github.com/apache/druid/pull/13503#discussion_r1040495755
########## docs/ingestion/tasks.md: ########## @@ -343,6 +343,27 @@ You can override the task priority by setting your priority in the task context "priority" : 100 } ``` +<a name="actions"></a> + +## Task actions + +These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows: +- `lockAcquire`: acquires a time-chunk lock on an interval for the task +- `lockRelease`: releases a lock acquired by the task on an interval +- `segmentInsertion`: inserts segments into metadata store Review Comment: Worth mentioning `segmentTransactionalInsert` too? ########## docs/ingestion/tasks.md: ########## @@ -343,6 +343,27 @@ You can override the task priority by setting your priority in the task context "priority" : 100 } ``` +<a name="actions"></a> + +## Task actions + +These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows: +- `lockAcquire`: acquires a time-chunk lock on an interval for the task +- `lockRelease`: releases a lock acquired by the task on an interval +- `segmentInsertion`: inserts segments into metadata store +- `segmentAllocate`: allocates pending segments to a task to write rows +- etc. + +### Batching `segmentAllocate` actions + +In a cluster with a large number of concurrent tasks (say > 1000), `segmentAllocate` actions on the overlord may take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated. Review Comment: I think this is going to be useful even at much lower concurrent task counts. I have seen people have issues with allocation timings around task rollover even with just 10s of tasks, requiring a scale-up of metadata store size. ########## docs/configuration/index.md: ########## @@ -1112,6 +1112,8 @@ These Overlord static configurations can be defined in the `overlord/runtime.pro |`druid.indexer.storage.type`|Choices are "local" or "metadata". Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. "local" is mainly for internal testing while "metadata" is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.|local| |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|PT24H| |`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context). See [Task Locking & Priority](../ingestion/tasks.md#context) for more details about locking in tasks.|true| +|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment allocate actions are performed in batches to improve the throughput and reduce the average `task/action/run/time`.|false| +|`druid.indexer.tasklock.batchAllocationWaitTime`|Milliseconds to wait between adding the first segment allocate action to a batch and executing that batch. The waiting time allows the batch to add more requests and thus improve the average segment allocation run time. This configuration takes effect only if `batchSegmentAllocation` is enabled. <br> This value should be decreased __only if__ there are failures while allocating segments due to metadata operations on a very large batch.|500| Review Comment: 500 milliseconds already seems pretty short. If people are getting failures due to too-large batches, I wonder if reducing this will reliably fix their problem. Maybe it's better to suggest they turn off batch segment allocation completely. Ideally, the batching would be robust to this case, and avoid generating oversized metadata operations. Is that possible? ########## docs/operations/metrics.md: ########## @@ -230,8 +230,14 @@ Note: If the JVM does not support CPU time measurement for the current thread, ` |------|-----------|------------------------------------------------------------|------------| |`task/run/time`|Milliseconds taken to run a task.| `dataSource`, `taskId`, `taskType`, `taskStatus`|Varies| |`task/pending/time`|Milliseconds taken for a task to wait for running.| `dataSource`, `taskId`, `taskType`|Varies| -|`task/action/log/time`|Milliseconds taken to log a task action to the audit log.| `dataSource`, `taskId`, `taskType`|< 1000 (subsecond)| -|`task/action/run/time`|Milliseconds taken to execute a task action.| `dataSource`, `taskId`, `taskType`|Varies from subsecond to a few seconds, based on action type.| +|`task/action/log/time`|Milliseconds taken to log a task action to the audit log.| `dataSource`, `taskId`, `taskType`, `taskActionType`|< 1000 (subsecond)| +|`task/action/run/time`|Milliseconds taken to execute a task action.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies from subsecond to a few seconds, based on action type.| +|`task/action/success/count`|Number of task actions that were executed successfully during the emission period. Currently only being emitted for batched `segmentAllocate` actions.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies| +|`task/action/failed/count`|Number of task actions that failed during the emission period. Currently only being emitted for batched `segmentAllocate` actions.| `dataSource`, `taskId`, `taskType`, `taskActionType`|Varies| Review Comment: Would be good to link all mentions of "batched segmentAllocate actions" to `../ingestion/tasks.md#batching-segmentallocate-actions`. ########## docs/configuration/index.md: ########## @@ -1112,6 +1112,8 @@ These Overlord static configurations can be defined in the `overlord/runtime.pro |`druid.indexer.storage.type`|Choices are "local" or "metadata". Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. "local" is mainly for internal testing while "metadata" is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.|local| |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|PT24H| |`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context). See [Task Locking & Priority](../ingestion/tasks.md#context) for more details about locking in tasks.|true| +|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment allocate actions are performed in batches to improve the throughput and reduce the average `task/action/run/time`.|false| Review Comment: Would be good to link to `../ingestion/tasks.md#batching-segmentallocate-actions`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
