Re: [PR] docs: add concurent compaction docs (druid)

via GitHub Tue, 24 Oct 2023 20:54:36 -0700


AmatyaAvadhanula commented on code in PR #15218:
URL: https://github.com/apache/druid/pull/15218#discussion_r1371108902



##########
docs/data-management/automatic-compaction.md:
##########
@@ -203,6 +203,85 @@ The following auto-compaction configuration compacts 
updates the `wikipedia` seg
 }
 ```
 
+## Concurrent append and replace
+
+:::info
+Concurrent append and replace is an [experimental 
feature](../development/experimental.md) and is not currently available for 
SQL-based ingestion.
+:::
+
+If you enable automatic compaction, you can use concurrent append and replace 
to concurrently compact data as you ingest it for streaming and legacy 
JSON-based batch ingestion. 
+
+Setting up concurrent append and replace is a two-step process. The first is 
to update your datasource and the second is to update your ingestion job.
+
+Using concurrent append and replace in the following scenarios can be 
beneficial:
+
+- If the job with an `APPEND` task and the job with a `REPLACE` task have the 
same segment granularity. For example, when a datasource and its streaming 
ingestion job have the same granularity.
+- If the job with an `APPEND` task  has a finer segment granularity than the 
replacing job.
+
+We do not recommend using concurrent append and replace when the job with an 
`APPEND` task has a coarser granularity than the job with a `REPLACE` task. For 
example, if the `APPEND` job has a yearly granularity and the `REPLACE` job has 
a monthly granularity. The job that finishes second will fail.
+ 
+### Configure concurrent append and replace
+
+##### Update the compaction settings with the API
+ 
+ First, prepare your datasource for concurrent append and replace by setting 
its task lock type to `REPLACE`.
+Add the `taskContext` like you would any other auto-compaction setting through 
the API:
+
+```shell
+curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+    "dataSource": "YOUR_DATASOURCE",
+    "taskContext": {
+        "taskLockType": "REPLACE"
+    }
+}'
+```
+
+##### Update the compaction settings with the UI
+
+In the **Compaction config** for a datasource, set  **Allow concurrent 
compactions (experimental)** to **True**.
+
+#### Add a task lock type to your ingestion job
+
+Next, you need to configure the task lock type for your ingestion job. For 
streaming jobs, the context parameter goes in your supervisor spec. For legacy 
JSON-based batch ingestion, the context parameter goes in your ingestion spec. 
You can provide the context parameter through the API like any other parameter 
for a streaming ingestion or JSON-based batch ingestion or UI.
+
+##### Add the task lock type through the API

Review Comment:
   Could we please explicitly add that a streaming supervisor spec must always 
have an APPEND lock when using concurrent append and replace?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: add concurent compaction docs (druid)

Reply via email to