writer-jill commented on code in PR #13503: URL: https://github.com/apache/druid/pull/13503#discussion_r1041174331
########## docs/configuration/index.md: ########## @@ -1112,6 +1112,8 @@ These Overlord static configurations can be defined in the `overlord/runtime.pro |`druid.indexer.storage.type`|Choices are "local" or "metadata". Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. "local" is mainly for internal testing while "metadata" is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.|local| |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|PT24H| |`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context). See [Task Locking & Priority](../ingestion/tasks.md#context) for more details about locking in tasks.|true| +|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment allocate actions are performed in batches to improve the throughput and reduce the average `task/action/run/time`. See [batching `segmentAllocate` actions](../ingestion/tasks.md#batching-segmentallocate-actions) for details.|false| Review Comment: ```suggestion |`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, Druid performs segment allocate actions in batches to improve throughput and reduce the average `task/action/run/time`. See [batching `segmentAllocate` actions](../ingestion/tasks.md#batching-segmentallocate-actions) for details.|false| ``` ########## docs/ingestion/tasks.md: ########## @@ -343,6 +343,27 @@ You can override the task priority by setting your priority in the task context "priority" : 100 } ``` +<a name="actions"></a> + +## Task actions + +These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows: Review Comment: ```suggestion Task actions are overlord actions performed by tasks during their lifecycle. Some typical task actions are: ``` ########## docs/ingestion/tasks.md: ########## @@ -343,6 +343,27 @@ You can override the task priority by setting your priority in the task context "priority" : 100 } ``` +<a name="actions"></a> + +## Task actions + +These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows: +- `lockAcquire`: acquires a time-chunk lock on an interval for the task +- `lockRelease`: releases a lock acquired by the task on an interval +- `segmentTransactionalInsert`: publishes new segments created by a task and optionally overwrites and/or drops existing segments in a single transaction +- `segmentAllocate`: allocates pending segments to a task to write rows +- etc. Review Comment: ```suggestion ``` ########## docs/ingestion/tasks.md: ########## @@ -343,6 +343,27 @@ You can override the task priority by setting your priority in the task context "priority" : 100 } ``` +<a name="actions"></a> + +## Task actions + +These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows: +- `lockAcquire`: acquires a time-chunk lock on an interval for the task +- `lockRelease`: releases a lock acquired by the task on an interval +- `segmentTransactionalInsert`: publishes new segments created by a task and optionally overwrites and/or drops existing segments in a single transaction +- `segmentAllocate`: allocates pending segments to a task to write rows +- etc. + +### Batching `segmentAllocate` actions + +In a cluster with several concurrent tasks, `segmentAllocate` actions on the overlord may take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in ingestion lag building up while a task waits for a segment to get allocated. +The root causes of such spikes are: +- several concurrent tasks trying to allocate segments for the same datasource and interval +- large number of metadata calls made to the segments and pending segments tables +- concurrency limitations while acquiring a task lock required for allocating a segment + +Since the contention typically arises from tasks allocating segments for the same datasource and interval, the run times can be improved by batching the actions together. +Batched segment allocation can be enabled on the overlord by setting `druid.indexer.tasklock.batchSegmentAllocation=true`.See [overlord configuration](../configuration/index.md#overlord-operations) for more details. Review Comment: ```suggestion To enable batched segment allocation on the overlord, set `druid.indexer.tasklock.batchSegmentAllocation` to `true`. See [overlord configuration](../configuration/index.md#overlord-operations) for more details. ``` ########## docs/ingestion/tasks.md: ########## @@ -343,6 +343,27 @@ You can override the task priority by setting your priority in the task context "priority" : 100 } ``` +<a name="actions"></a> + +## Task actions + +These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows: +- `lockAcquire`: acquires a time-chunk lock on an interval for the task +- `lockRelease`: releases a lock acquired by the task on an interval +- `segmentTransactionalInsert`: publishes new segments created by a task and optionally overwrites and/or drops existing segments in a single transaction +- `segmentAllocate`: allocates pending segments to a task to write rows +- etc. + +### Batching `segmentAllocate` actions + +In a cluster with several concurrent tasks, `segmentAllocate` actions on the overlord may take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in ingestion lag building up while a task waits for a segment to get allocated. +The root causes of such spikes are: Review Comment: ```suggestion The root cause of such spikes is likely to be one of more of the following: ``` ########## docs/configuration/index.md: ########## @@ -1112,6 +1112,8 @@ These Overlord static configurations can be defined in the `overlord/runtime.pro |`druid.indexer.storage.type`|Choices are "local" or "metadata". Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. "local" is mainly for internal testing while "metadata" is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.|local| |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.|PT24H| |`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting `forceTimeChunkLock` in the [task context](../ingestion/tasks.md#context). See [Task Locking & Priority](../ingestion/tasks.md#context) for more details about locking in tasks.|true| +|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment allocate actions are performed in batches to improve the throughput and reduce the average `task/action/run/time`. See [batching `segmentAllocate` actions](../ingestion/tasks.md#batching-segmentallocate-actions) for details.|false| +|`druid.indexer.tasklock.batchAllocationWaitTime`|Milliseconds to wait between adding the first segment allocate action to a batch and executing that batch. The waiting time allows the batch to add more requests and thus improve the average segment allocation run time. This configuration takes effect only if `batchSegmentAllocation` is enabled.|500| Review Comment: ```suggestion |`druid.indexer.tasklock.batchAllocationWaitTime`|Number of milliseconds after Druid adds the first segment allocate action to a batch, until it executes the batch. Allows the batch to add more requests and improve the average segment allocation run time. This configuration takes effect only if `batchSegmentAllocation` is enabled.|500| ``` ########## docs/ingestion/tasks.md: ########## @@ -343,6 +343,27 @@ You can override the task priority by setting your priority in the task context "priority" : 100 } ``` +<a name="actions"></a> + +## Task actions + +These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows: +- `lockAcquire`: acquires a time-chunk lock on an interval for the task +- `lockRelease`: releases a lock acquired by the task on an interval +- `segmentTransactionalInsert`: publishes new segments created by a task and optionally overwrites and/or drops existing segments in a single transaction +- `segmentAllocate`: allocates pending segments to a task to write rows +- etc. + +### Batching `segmentAllocate` actions + +In a cluster with several concurrent tasks, `segmentAllocate` actions on the overlord may take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in ingestion lag building up while a task waits for a segment to get allocated. Review Comment: ```suggestion In a cluster with several concurrent tasks, `segmentAllocate` actions on the overlord can take a long time to finish, causing spikes in the `task/action/run/time`. This can result in ingestion lag building up while a task waits for a segment to be allocated. ``` ########## docs/ingestion/tasks.md: ########## @@ -343,6 +343,27 @@ You can override the task priority by setting your priority in the task context "priority" : 100 } ``` +<a name="actions"></a> + +## Task actions + +These are various overlord actions performed by tasks during their lifecycle. Some typical actions are as follows: +- `lockAcquire`: acquires a time-chunk lock on an interval for the task +- `lockRelease`: releases a lock acquired by the task on an interval +- `segmentTransactionalInsert`: publishes new segments created by a task and optionally overwrites and/or drops existing segments in a single transaction +- `segmentAllocate`: allocates pending segments to a task to write rows +- etc. + +### Batching `segmentAllocate` actions + +In a cluster with several concurrent tasks, `segmentAllocate` actions on the overlord may take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in ingestion lag building up while a task waits for a segment to get allocated. +The root causes of such spikes are: +- several concurrent tasks trying to allocate segments for the same datasource and interval +- large number of metadata calls made to the segments and pending segments tables +- concurrency limitations while acquiring a task lock required for allocating a segment + +Since the contention typically arises from tasks allocating segments for the same datasource and interval, the run times can be improved by batching the actions together. Review Comment: ```suggestion Since the contention typically arises from tasks allocating segments for the same datasource and interval, you can improve the run times by batching the actions together. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
