Re: [PR] docs: revise concurrent append and replace (druid)

via GitHub Sun, 28 Jan 2024 20:26:27 -0800


AmatyaAvadhanula commented on code in PR #15760:
URL: https://github.com/apache/druid/pull/15760#discussion_r1469056746



##########
docs/data-management/automatic-compaction.md:
##########
@@ -152,6 +142,34 @@ druid.coordinator.compaction.duties=["compactSegments"]
 druid.coordinator.compaction.period=PT60S
 ```
 
+## Avoid conflicts with ingestion
+
+Compaction tasks may be interrupted when they interfere with ingestion. For 
example, this occurs when an ingestion task needs to write data to a segment 
for a time interval locked for compaction. If there are continuous failures 
that prevent compaction from making progress, consider one of the following 
strategies:
+
+* Enable [concurrent append and replace 
tasks](#enable-concurrent-append-and-replace) on your datasource and on the 
ingestion tasks.
+* Set `skipOffsetFromLatest` to reduce the chance of conflicts between 
ingestion and compaction. See more details in [Skip latest segments from 
compaction](#skip-compaction-for-latest-segments).
+* Increase the priority value of compaction tasks relative to ingestion tasks. 
Only recommended for advanced users. This approach can cause ingestion jobs to 
fail or lag. To change the priority of compaction tasks, set `taskPriority` to 
the desired priority value in the auto-compaction configuration. For details on 
the priority values of different task types, see [Lock 
priority](../ingestion/tasks.md#lock-priority).
+
+### Enable concurrent append and replace
+
+You can use concurrent append and replace to safely replace the existing data 
in an interval of a datasource while new data is being appended to that 
interval even during compaction.
+
+To do this, you need to update your datasource to allow concurrent append and 
replace tasks:
+
+* If you're using the API, include the following `taskContext` property in 
your API call: `"useConcurrentLocks": "true"`
+* If you're using the UI, enable **Allow concurrent compactions 
(experimental)** in the **Compaction config** for your datasource.
+
+You'll also need to update your ingestion jobs to include a task lock.

Review Comment:
   You'll also need to update your ingestion jobs for the datasource to include 
`"useConcurrentLocks": true` in the taskContext.



##########
docs/ingestion/concurrent-append-replace.md:
##########
@@ -0,0 +1,143 @@
+---
+id: concurrent-append-replace
+title: Concurrent append and replace
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info
+Concurrent append and replace is an [experimental 
feature](../development/experimental.md) available for JSON-based batch and 
streaming. It is not currently available for SQL-based ingestion.
+:::
+
+Concurrent append and replace safely replaces the existing data in an interval 
of a datasource while new data is being appended to that interval. One of the 
most common applications of this feature is appending new data (using say 
streaming ingestion) to an interval while compaction of that interval is 
already in progress. 
+
+To set up concurrent append and replace, use the context flag 
`useConcurrentLocks`. Druid will then determine the correct lock type for you, 
either append or replace. Although you can set the type of lock manually, we 
don't recommend it. 
+
+## Update the compaction settings 
+
+If you want to append data to a datasource while compaction is running, you 
need to enable concurrent append and replace for the datasource by updating the 
compaction settings.
+
+### Update the compaction settings with the UI
+
+In the **Compaction config** for a datasource, enable  **Allow concurrent 
compactions (experimental)**.
+
+For details on accessing the compaction config in the UI, see [Enable 
automatic compaction with the web 
console](../data-management/automatic-compaction.md#web-console).
+
+### Update the compaction settings with the API
+ 
+Add the `taskContext` like you would any other automatic compaction setting 
through the API:
+
+```shell
+curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+    "dataSource": "YOUR_DATASOURCE",
+    "taskContext": {
+        "useConcurrentLocks": "true"

Review Comment:
   No double quotes around true



##########
docs/ingestion/concurrent-append-replace.md:
##########
@@ -0,0 +1,143 @@
+---
+id: concurrent-append-replace
+title: Concurrent append and replace
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info
+Concurrent append and replace is an [experimental 
feature](../development/experimental.md) available for JSON-based batch and 
streaming. It is not currently available for SQL-based ingestion.
+:::
+
+Concurrent append and replace safely replaces the existing data in an interval 
of a datasource while new data is being appended to that interval. One of the 
most common applications of this feature is appending new data (using say 
streaming ingestion) to an interval while compaction of that interval is 
already in progress. 
+
+To set up concurrent append and replace, use the context flag 
`useConcurrentLocks`. Druid will then determine the correct lock type for you, 
either append or replace. Although you can set the type of lock manually, we 
don't recommend it. 
+
+## Update the compaction settings 
+
+If you want to append data to a datasource while compaction is running, you 
need to enable concurrent append and replace for the datasource by updating the 
compaction settings.
+
+### Update the compaction settings with the UI
+
+In the **Compaction config** for a datasource, enable  **Allow concurrent 
compactions (experimental)**.
+
+For details on accessing the compaction config in the UI, see [Enable 
automatic compaction with the web 
console](../data-management/automatic-compaction.md#web-console).
+
+### Update the compaction settings with the API
+ 
+Add the `taskContext` like you would any other automatic compaction setting 
through the API:
+
+```shell
+curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+    "dataSource": "YOUR_DATASOURCE",
+    "taskContext": {
+        "useConcurrentLocks": "true"
+    }
+}'
+```
+
+## Add a task lock to your ingestion job

Review Comment:
   Add a task lock sounds a bit confusing. (Since every ingestion job uses task 
locks)
   "Configure task lock type" may be better. Or simply "Enable concurrent 
locks"?



##########
docs/data-management/automatic-compaction.md:
##########
@@ -152,6 +142,34 @@ druid.coordinator.compaction.duties=["compactSegments"]
 druid.coordinator.compaction.period=PT60S
 ```
 
+## Avoid conflicts with ingestion
+
+Compaction tasks may be interrupted when they interfere with ingestion. For 
example, this occurs when an ingestion task needs to write data to a segment 
for a time interval locked for compaction. If there are continuous failures 
that prevent compaction from making progress, consider one of the following 
strategies:
+
+* Enable [concurrent append and replace 
tasks](#enable-concurrent-append-and-replace) on your datasource and on the 
ingestion tasks.
+* Set `skipOffsetFromLatest` to reduce the chance of conflicts between 
ingestion and compaction. See more details in [Skip latest segments from 
compaction](#skip-compaction-for-latest-segments).
+* Increase the priority value of compaction tasks relative to ingestion tasks. 
Only recommended for advanced users. This approach can cause ingestion jobs to 
fail or lag. To change the priority of compaction tasks, set `taskPriority` to 
the desired priority value in the auto-compaction configuration. For details on 
the priority values of different task types, see [Lock 
priority](../ingestion/tasks.md#lock-priority).
+
+### Enable concurrent append and replace
+
+You can use concurrent append and replace to safely replace the existing data 
in an interval of a datasource while new data is being appended to that 
interval even during compaction.
+
+To do this, you need to update your datasource to allow concurrent append and 
replace tasks:
+
+* If you're using the API, include the following `taskContext` property in 
your API call: `"useConcurrentLocks": "true"`

Review Comment:
   No double quotes surrounding true



##########
docs/ingestion/concurrent-append-replace.md:
##########
@@ -0,0 +1,143 @@
+---
+id: concurrent-append-replace
+title: Concurrent append and replace
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+:::info
+Concurrent append and replace is an [experimental 
feature](../development/experimental.md) available for JSON-based batch and 
streaming. It is not currently available for SQL-based ingestion.
+:::
+
+Concurrent append and replace safely replaces the existing data in an interval 
of a datasource while new data is being appended to that interval. One of the 
most common applications of this feature is appending new data (using say 
streaming ingestion) to an interval while compaction of that interval is 
already in progress. 
+
+To set up concurrent append and replace, use the context flag 
`useConcurrentLocks`. Druid will then determine the correct lock type for you, 
either append or replace. Although you can set the type of lock manually, we 
don't recommend it. 
+
+## Update the compaction settings 
+
+If you want to append data to a datasource while compaction is running, you 
need to enable concurrent append and replace for the datasource by updating the 
compaction settings.
+
+### Update the compaction settings with the UI
+
+In the **Compaction config** for a datasource, enable  **Allow concurrent 
compactions (experimental)**.
+
+For details on accessing the compaction config in the UI, see [Enable 
automatic compaction with the web 
console](../data-management/automatic-compaction.md#web-console).
+
+### Update the compaction settings with the API
+ 
+Add the `taskContext` like you would any other automatic compaction setting 
through the API:
+
+```shell
+curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+    "dataSource": "YOUR_DATASOURCE",
+    "taskContext": {
+        "useConcurrentLocks": "true"
+    }
+}'
+```
+
+## Add a task lock to your ingestion job
+
+You also need to configure the ingestion job to allow concurrent tasks.
+
+You can provide the context parameter like any other parameter for ingestion 
jobs through the API or the UI.
+
+### Add a task lock using the Druid console
+
+As part of the  **Load data** wizard for classic batch (JSON-based ingestion) 
and streaming ingestion, enable the following config on the **Publish** step: 
**Allow concurrent tasks (experimental)**.
+
+### Add the task lock through the API
+
+Add the following JSON snippet to your supervisor or ingestion spec if you're 
using the API:
+
+```json
+"context": {
+   "useConcurrentLocks": true
+}
+```
+ 
+
+## Task lock types
+
+We recommend that you use the `useConcurrentLocks` context parameter so that 
Druid automatically determines the task lock types for you. If, for some 
reason, you need to manually set the task lock types explicitly, you can read 
more about them in this section.
+
+<details><summary>Click here to read more about the lock types.</summary>
+
+Druid uses task lock

Review Comment:
   Is this incomplete?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: revise concurrent append and replace (druid)

Reply via email to