[GitHub] [druid] writer-jill commented on a diff in pull request #13503: Limit max batch size for segment allocation, add docs

GitBox Tue, 06 Dec 2022 08:24:46 -0800


writer-jill commented on code in PR #13503:
URL: https://github.com/apache/druid/pull/13503#discussion_r1041174331



##########
docs/configuration/index.md:
##########
@@ -1112,6 +1112,8 @@ These Overlord static configurations can be defined in 
the `overlord/runtime.pro
 |`druid.indexer.storage.type`|Choices are "local" or "metadata". Indicates 
whether incoming tasks should be stored locally (in heap) or in metadata 
storage. "local" is mainly for internal testing while "metadata" is recommended 
in production because storing incoming tasks in metadata storage allows for 
tasks to be resumed if the Overlord should fail.|local|
 |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store 
task results. Default is 24 hours. If you have hundreds of tasks running in a 
day, consider increasing this threshold.|PT24H|
 |`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still 
experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If 
not set, each task automatically chooses a lock type to use. This configuration 
can be overwritten by setting `forceTimeChunkLock` in the [task 
context](../ingestion/tasks.md#context). See [Task Locking & 
Priority](../ingestion/tasks.md#context) for more details about locking in 
tasks.|true|
+|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment 
allocate actions are performed in batches to improve the throughput and reduce 
the average `task/action/run/time`. See [batching `segmentAllocate` 
actions](../ingestion/tasks.md#batching-segmentallocate-actions) for 
details.|false|

Review Comment:
   ```suggestion
   |`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, Druid 
performs segment allocate actions in batches to improve throughput and reduce 
the average `task/action/run/time`. See [batching `segmentAllocate` 
actions](../ingestion/tasks.md#batching-segmentallocate-actions) for 
details.|false|
   ```



##########
docs/ingestion/tasks.md:
##########
@@ -343,6 +343,27 @@ You can override the task priority by setting your 
priority in the task context
   "priority" : 100
 }
 ```
+<a name="actions"></a>
+
+## Task actions
+
+These are various overlord actions performed by tasks during their lifecycle. 
Some typical actions are as follows:

Review Comment:
   ```suggestion
   Task actions are overlord actions performed by tasks during their lifecycle. 
Some typical task actions are:
   ```



##########
docs/ingestion/tasks.md:
##########
@@ -343,6 +343,27 @@ You can override the task priority by setting your 
priority in the task context
   "priority" : 100
 }
 ```
+<a name="actions"></a>
+
+## Task actions
+
+These are various overlord actions performed by tasks during their lifecycle. 
Some typical actions are as follows:
+- `lockAcquire`: acquires a time-chunk lock on an interval for the task
+- `lockRelease`: releases a lock acquired by the task on an interval
+- `segmentTransactionalInsert`: publishes new segments created by a task and 
optionally overwrites and/or drops existing segments in a single transaction
+- `segmentAllocate`: allocates pending segments to a task to write rows
+- etc.

Review Comment:
   ```suggestion
   ```



##########
docs/ingestion/tasks.md:
##########
@@ -343,6 +343,27 @@ You can override the task priority by setting your 
priority in the task context
   "priority" : 100
 }
 ```
+<a name="actions"></a>
+
+## Task actions
+
+These are various overlord actions performed by tasks during their lifecycle. 
Some typical actions are as follows:
+- `lockAcquire`: acquires a time-chunk lock on an interval for the task
+- `lockRelease`: releases a lock acquired by the task on an interval
+- `segmentTransactionalInsert`: publishes new segments created by a task and 
optionally overwrites and/or drops existing segments in a single transaction
+- `segmentAllocate`: allocates pending segments to a task to write rows
+- etc.
+
+### Batching `segmentAllocate` actions
+
+In a cluster with several concurrent tasks, `segmentAllocate` actions on the 
overlord may take very long intervals of time to finish thus causing spikes in 
the `task/action/run/time`. This may result in ingestion lag building up while 
a task waits for a segment to get allocated.
+The root causes of such spikes are:
+- several concurrent tasks trying to allocate segments for the same datasource 
and interval
+- large number of metadata calls made to the segments and pending segments 
tables 
+- concurrency limitations while acquiring a task lock required for allocating 
a segment
+
+Since the contention typically arises from tasks allocating segments for the 
same datasource and interval, the run times can be improved by batching the 
actions together.
+Batched segment allocation can be enabled on the overlord by setting 
`druid.indexer.tasklock.batchSegmentAllocation=true`.See [overlord 
configuration](../configuration/index.md#overlord-operations) for more details.

Review Comment:
   ```suggestion
   To enable batched segment allocation on the overlord, set  
`druid.indexer.tasklock.batchSegmentAllocation` to  `true`. See [overlord 
configuration](../configuration/index.md#overlord-operations) for more details.
   ```



##########
docs/ingestion/tasks.md:
##########
@@ -343,6 +343,27 @@ You can override the task priority by setting your 
priority in the task context
   "priority" : 100
 }
 ```
+<a name="actions"></a>
+
+## Task actions
+
+These are various overlord actions performed by tasks during their lifecycle. 
Some typical actions are as follows:
+- `lockAcquire`: acquires a time-chunk lock on an interval for the task
+- `lockRelease`: releases a lock acquired by the task on an interval
+- `segmentTransactionalInsert`: publishes new segments created by a task and 
optionally overwrites and/or drops existing segments in a single transaction
+- `segmentAllocate`: allocates pending segments to a task to write rows
+- etc.
+
+### Batching `segmentAllocate` actions
+
+In a cluster with several concurrent tasks, `segmentAllocate` actions on the 
overlord may take very long intervals of time to finish thus causing spikes in 
the `task/action/run/time`. This may result in ingestion lag building up while 
a task waits for a segment to get allocated.
+The root causes of such spikes are:

Review Comment:
   ```suggestion
   The root cause of such spikes is likely to be one of more of the following:
   ```



##########
docs/configuration/index.md:
##########
@@ -1112,6 +1112,8 @@ These Overlord static configurations can be defined in 
the `overlord/runtime.pro
 |`druid.indexer.storage.type`|Choices are "local" or "metadata". Indicates 
whether incoming tasks should be stored locally (in heap) or in metadata 
storage. "local" is mainly for internal testing while "metadata" is recommended 
in production because storing incoming tasks in metadata storage allows for 
tasks to be resumed if the Overlord should fail.|local|
 |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store 
task results. Default is 24 hours. If you have hundreds of tasks running in a 
day, consider increasing this threshold.|PT24H|
 |`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still 
experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If 
not set, each task automatically chooses a lock type to use. This configuration 
can be overwritten by setting `forceTimeChunkLock` in the [task 
context](../ingestion/tasks.md#context). See [Task Locking & 
Priority](../ingestion/tasks.md#context) for more details about locking in 
tasks.|true|
+|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, segment 
allocate actions are performed in batches to improve the throughput and reduce 
the average `task/action/run/time`. See [batching `segmentAllocate` 
actions](../ingestion/tasks.md#batching-segmentallocate-actions) for 
details.|false|
+|`druid.indexer.tasklock.batchAllocationWaitTime`|Milliseconds to wait between 
adding the first segment allocate action to a batch and executing that batch. 
The waiting time allows the batch to add more requests and thus improve the 
average segment allocation run time. This configuration takes effect only if 
`batchSegmentAllocation` is enabled.|500|

Review Comment:
   ```suggestion
   |`druid.indexer.tasklock.batchAllocationWaitTime`|Number of milliseconds 
after Druid adds the first segment allocate action to a batch, until it 
executes the batch. Allows the batch to add more requests and improve the 
average segment allocation run time. This configuration takes effect only if 
`batchSegmentAllocation` is enabled.|500|
   ```



##########
docs/ingestion/tasks.md:
##########
@@ -343,6 +343,27 @@ You can override the task priority by setting your 
priority in the task context
   "priority" : 100
 }
 ```
+<a name="actions"></a>
+
+## Task actions
+
+These are various overlord actions performed by tasks during their lifecycle. 
Some typical actions are as follows:
+- `lockAcquire`: acquires a time-chunk lock on an interval for the task
+- `lockRelease`: releases a lock acquired by the task on an interval
+- `segmentTransactionalInsert`: publishes new segments created by a task and 
optionally overwrites and/or drops existing segments in a single transaction
+- `segmentAllocate`: allocates pending segments to a task to write rows
+- etc.
+
+### Batching `segmentAllocate` actions
+
+In a cluster with several concurrent tasks, `segmentAllocate` actions on the 
overlord may take very long intervals of time to finish thus causing spikes in 
the `task/action/run/time`. This may result in ingestion lag building up while 
a task waits for a segment to get allocated.

Review Comment:
   ```suggestion
   In a cluster with several concurrent tasks, `segmentAllocate` actions on the 
overlord can take a long time to finish, causing spikes in the 
`task/action/run/time`. This can result in ingestion lag building up while a 
task waits for a segment to be allocated.
   ```



##########
docs/ingestion/tasks.md:
##########
@@ -343,6 +343,27 @@ You can override the task priority by setting your 
priority in the task context
   "priority" : 100
 }
 ```
+<a name="actions"></a>
+
+## Task actions
+
+These are various overlord actions performed by tasks during their lifecycle. 
Some typical actions are as follows:
+- `lockAcquire`: acquires a time-chunk lock on an interval for the task
+- `lockRelease`: releases a lock acquired by the task on an interval
+- `segmentTransactionalInsert`: publishes new segments created by a task and 
optionally overwrites and/or drops existing segments in a single transaction
+- `segmentAllocate`: allocates pending segments to a task to write rows
+- etc.
+
+### Batching `segmentAllocate` actions
+
+In a cluster with several concurrent tasks, `segmentAllocate` actions on the 
overlord may take very long intervals of time to finish thus causing spikes in 
the `task/action/run/time`. This may result in ingestion lag building up while 
a task waits for a segment to get allocated.
+The root causes of such spikes are:
+- several concurrent tasks trying to allocate segments for the same datasource 
and interval
+- large number of metadata calls made to the segments and pending segments 
tables 
+- concurrency limitations while acquiring a task lock required for allocating 
a segment
+
+Since the contention typically arises from tasks allocating segments for the 
same datasource and interval, the run times can be improved by batching the 
actions together.

Review Comment:
   ```suggestion
   Since the contention typically arises from tasks allocating segments for the 
same datasource and interval, you can improve the run times by batching the 
actions together.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] writer-jill commented on a diff in pull request #13503: Limit max batch size for segment allocation, add docs

Reply via email to