AmatyaAvadhanula commented on code in PR #15218: URL: https://github.com/apache/druid/pull/15218#discussion_r1371108902
########## docs/data-management/automatic-compaction.md: ########## @@ -203,6 +203,85 @@ The following auto-compaction configuration compacts updates the `wikipedia` seg } ``` +## Concurrent append and replace + +:::info +Concurrent append and replace is an [experimental feature](../development/experimental.md) and is not currently available for SQL-based ingestion. +::: + +If you enable automatic compaction, you can use concurrent append and replace to concurrently compact data as you ingest it for streaming and legacy JSON-based batch ingestion. + +Setting up concurrent append and replace is a two-step process. The first is to update your datasource and the second is to update your ingestion job. + +Using concurrent append and replace in the following scenarios can be beneficial: + +- If the job with an `APPEND` task and the job with a `REPLACE` task have the same segment granularity. For example, when a datasource and its streaming ingestion job have the same granularity. +- If the job with an `APPEND` task has a finer segment granularity than the replacing job. + +We do not recommend using concurrent append and replace when the job with an `APPEND` task has a coarser granularity than the job with a `REPLACE` task. For example, if the `APPEND` job has a yearly granularity and the `REPLACE` job has a monthly granularity. The job that finishes second will fail. + +### Configure concurrent append and replace + +##### Update the compaction settings with the API + + First, prepare your datasource for concurrent append and replace by setting its task lock type to `REPLACE`. +Add the `taskContext` like you would any other auto-compaction setting through the API: + +```shell +curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/config/compaction' \ +--header 'Content-Type: application/json' \ +--data-raw '{ + "dataSource": "YOUR_DATASOURCE", + "taskContext": { + "taskLockType": "REPLACE" + } +}' +``` + +##### Update the compaction settings with the UI + +In the **Compaction config** for a datasource, set **Allow concurrent compactions (experimental)** to **True**. + +#### Add a task lock type to your ingestion job + +Next, you need to configure the task lock type for your ingestion job. For streaming jobs, the context parameter goes in your supervisor spec. For legacy JSON-based batch ingestion, the context parameter goes in your ingestion spec. You can provide the context parameter through the API like any other parameter for a streaming ingestion or JSON-based batch ingestion or UI. + +##### Add the task lock type through the API Review Comment: Could we please explicitly add that a supervisor spec must always have an APPEND lock when using concurrent append and replace? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
