kfaraz commented on code in PR #15218: URL: https://github.com/apache/druid/pull/15218#discussion_r1371336412
########## docs/ingestion/ingestion-spec.md: ########## @@ -530,3 +530,11 @@ You can enable front coding with all types of ingestion. For information on defi Beyond these properties, each ingestion method has its own specific tuning properties. See the documentation for each [ingestion method](./index.md#ingestion-methods) for details. + +## Context Review Comment: We should skip adding this section for now as this is an experimental feature. If we are adding a task context section to this doc, it would first need to talk about other more important parameters. ########## docs/development/extensions-core/kafka-supervisor-reference.md: ########## @@ -258,4 +258,12 @@ The following table outlines the configuration options for `indexSpec`: |`bitmap`|Object|Compression format for bitmap indexes. Druid supports roaring and concise bitmap types.|No|Roaring| |`dimensionCompression`|String|Compression format for dimension columns. Choose from `LZ4`, `LZF`, `ZSTD` or `uncompressed`.|No|`LZ4`| |`metricCompression`|String|Compression format for primitive type metric columns. Choose from `LZ4`, `LZF`, `ZSTD`, `uncompressed` or `none`.|No|`LZ4`| -|`longEncoding`|String|Encoding format for metric and dimension columns with type long. Choose from `auto` or `longs`. `auto` encodes the values using offset or lookup table depending on column cardinality, and store them with variable size. `longs` stores the value as is with 8 bytes each.|No|`longs`| \ No newline at end of file +|`longEncoding`|String|Encoding format for metric and dimension columns with type long. Choose from `auto` or `longs`. `auto` encodes the values using offset or lookup table depending on column cardinality, and store them with variable size. `longs` stores the value as is with 8 bytes each.|No|`longs`| + +## Context Review Comment: We should not add a separate section for this right now. We can do this later when the feature is more well-baked. ########## docs/data-management/automatic-compaction.md: ########## @@ -203,6 +203,85 @@ The following auto-compaction configuration compacts updates the `wikipedia` seg } ``` +## Concurrent append and replace + +:::info +Concurrent append and replace is an [experimental feature](../development/experimental.md) and is not currently available for SQL-based ingestion. +::: + +If you enable automatic compaction, you can use concurrent append and replace to concurrently compact data as you ingest it for streaming and legacy JSON-based batch ingestion. Review Comment: ```suggestion This feature allows you to safely replace the existing data in an interval of a datasource while new data is being appended to that interval. One of the most common applications of this is appending new data (using say streaming ingestion) to an interval while compaction of that interval is already in progress. ``` ########## docs/data-management/automatic-compaction.md: ########## @@ -203,6 +203,85 @@ The following auto-compaction configuration compacts updates the `wikipedia` seg } ``` +## Concurrent append and replace + +:::info +Concurrent append and replace is an [experimental feature](../development/experimental.md) and is not currently available for SQL-based ingestion. +::: + +If you enable automatic compaction, you can use concurrent append and replace to concurrently compact data as you ingest it for streaming and legacy JSON-based batch ingestion. + +Setting up concurrent append and replace is a two-step process. The first is to update your datasource and the second is to update your ingestion job. Review Comment: This is not exactly correct. It doesn't make a lot of sense to "update a datasource" unless you mean adding data to a datasource. Moreover, we shouldn't even look at this as a two step process, rather as an opt-in behaviour. Any ingestion job that wants to run concurrently with other ingestion jobs needs to use the correct lock types. Please see the other suggestion. ########## docs/data-management/compaction.md: ########## @@ -43,18 +44,20 @@ By default, compaction does not modify the underlying data of the segments. Howe Compaction does not improve performance in all situations. For example, if you rewrite your data with each ingestion task, you don't need to use compaction. See [Segment optimization](../operations/segment-optimization.md) for additional guidance to determine if compaction will help in your environment. -## Types of compaction +## Choose your compaction type Review Comment: I don't think this heading aligns with the rest of headings. Also, the type of compaction is not really much of a choice as say how partioning type is a choice (range or hashed or dynamic, where we are choosing three different paths that give you 3 different results). We should just call this `Ways to run compaction` or something in a similar vein. ########## docs/data-management/automatic-compaction.md: ########## @@ -203,6 +203,85 @@ The following auto-compaction configuration compacts updates the `wikipedia` seg } ``` +## Concurrent append and replace + +:::info +Concurrent append and replace is an [experimental feature](../development/experimental.md) and is not currently available for SQL-based ingestion. +::: + +If you enable automatic compaction, you can use concurrent append and replace to concurrently compact data as you ingest it for streaming and legacy JSON-based batch ingestion. + +Setting up concurrent append and replace is a two-step process. The first is to update your datasource and the second is to update your ingestion job. + +Using concurrent append and replace in the following scenarios can be beneficial: + +- If the job with an `APPEND` task and the job with a `REPLACE` task have the same segment granularity. For example, when a datasource and its streaming ingestion job have the same granularity. +- If the job with an `APPEND` task has a finer segment granularity than the replacing job. Review Comment: ```suggestion You can enable concurrent append and replace by ensuring the following: - The append task (with `appendToExisting` set to `true`) has `taskLockType` set to `APPEND` in the task context. - The replace task (with `appendToExisting` set to `false`) has `taskLockType` set to `REPLACE` in the task context. - The segment granularity of the append task is equal to or finer than the segment granularity of the replace task. ``` ########## docs/data-management/automatic-compaction.md: ########## @@ -203,6 +203,85 @@ The following auto-compaction configuration compacts updates the `wikipedia` seg } ``` +## Concurrent append and replace + +:::info +Concurrent append and replace is an [experimental feature](../development/experimental.md) and is not currently available for SQL-based ingestion. +::: + +If you enable automatic compaction, you can use concurrent append and replace to concurrently compact data as you ingest it for streaming and legacy JSON-based batch ingestion. + +Setting up concurrent append and replace is a two-step process. The first is to update your datasource and the second is to update your ingestion job. + +Using concurrent append and replace in the following scenarios can be beneficial: + +- If the job with an `APPEND` task and the job with a `REPLACE` task have the same segment granularity. For example, when a datasource and its streaming ingestion job have the same granularity. +- If the job with an `APPEND` task has a finer segment granularity than the replacing job. + +We do not recommend using concurrent append and replace when the job with an `APPEND` task has a coarser granularity than the job with a `REPLACE` task. For example, if the `APPEND` job has a yearly granularity and the `REPLACE` job has a monthly granularity. The job that finishes second will fail. Review Comment: This point should be in a note or warning block. Two more points to call out are that: ``` At any point in time - There can only be a single task that holds a `REPLACE` lock on a given interval of a datasource. - There may be multiple tasks that hold `APPEND` locks on a given interval of a datasource and append data to that interval simultaneously. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
