[druid] branch master updated: Docs: Typo and language cleanup in Kinesis ingestion docs (#14356)

kfaraz Thu, 01 Jun 2023 19:48:52 -0700

This is an automated email from the ASF dual-hosted git repository.

kfaraz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 55effd92cf Docs: Typo and language cleanup in Kinesis ingestion docs 
(#14356)
55effd92cf is described below

commit 55effd92cfe80c23e1005e70a86da585764a7771
Author: Andreas Maechler <[email protected]>
AuthorDate: Thu Jun 1 20:48:41 2023 -0600

    Docs: Typo and language cleanup in Kinesis ingestion docs (#14356)
    
    Co-authored-by: Katya Macedo  <[email protected]>
    Co-authored-by: Kashif Faraz <[email protected]>
    Co-authored-by: Victoria Lim <[email protected]>
---
 .../extensions-core/kinesis-ingestion.md           | 120 +++++++++++----------
 1 file changed, 63 insertions(+), 57 deletions(-)

diff --git a/docs/development/extensions-core/kinesis-ingestion.md 
b/docs/development/extensions-core/kinesis-ingestion.md
index 046ffd2ad6..36e65f538c 100644
--- a/docs/development/extensions-core/kinesis-ingestion.md
+++ b/docs/development/extensions-core/kinesis-ingestion.md
@@ -24,10 +24,10 @@ sidebar_label: "Amazon Kinesis"
   -->
 
 When you enable the Kinesis indexing service, you can configure *supervisors* 
on the Overlord to manage the creation and lifetime of Kinesis indexing tasks. 
These indexing tasks read events using Kinesis' own shard and sequence number 
mechanism to guarantee exactly-once ingestion. The supervisor oversees the 
state of the indexing tasks to:
-  - coordinate handoffs
-  - manage failures
-  - ensure that scalability and replication requirements are maintained.
 
+- coordinate handoffs
+- manage failures
+- ensure that scalability and replication requirements are maintained.
 
 To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
core Apache Druid extension (see
 [Including Extensions](../../configuration/extensions.md#loading-extensions)).
@@ -38,12 +38,11 @@ To use the Kinesis indexing service, load the 
`druid-kinesis-indexing-service` c
 
 To use the Kinesis indexing service, load the `druid-kinesis-indexing-service` 
extension on both the Overlord and the MiddleManagers. Druid starts a 
supervisor for a dataSource when you submit a supervisor spec. Submit your 
supervisor spec to the following endpoint:
 
-
 `http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`
 
 For example:
 
-```
+```sh
 curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json 
http://localhost:8090/druid/indexer/v1/supervisor
 ```
 
@@ -111,7 +110,6 @@ Where the file `supervisor-spec.json` contains a Kinesis 
supervisor spec:
 }
 ```
 
-
 ## Supervisor Spec
 
 |Field|Description|Required|
@@ -136,7 +134,7 @@ Where the file `supervisor-spec.json` contains a Kinesis 
supervisor spec:
 |`period`|ISO8601 Period|How often the supervisor will execute its management 
logic. Note that the supervisor will also run in response to certain events 
(such as tasks succeeding, failing, and reaching their taskDuration) so this 
value specifies the maximum time between iterations.|no (default == PT30S)|
 |`useEarliestSequenceNumber`|Boolean|If a supervisor is managing a dataSource 
for the first time, it will obtain a set of starting sequence numbers from 
Kinesis. This flag determines whether it retrieves the earliest or latest 
sequence numbers in Kinesis. Under normal circumstances, subsequent tasks will 
start from where the previous segments ended so this flag will only be used on 
first run.|no (default == false)|
 |`completionTimeout`|ISO8601 Period|The length of time to wait before 
declaring a publishing task as failed and terminating it. If this is set too 
low, your tasks may never publish. The publishing clock for a task begins 
roughly after `taskDuration` elapses.|no (default == PT6H)|
-|`lateMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps earlier than this period before the task was created; 
for example if this is set to `PT1H` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a realtime an [...]
+|`lateMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps earlier than this period before the task was created; 
for example if this is set to `PT1H` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a streaming a [...]
 |`earlyMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps later than this period after the task reached its 
taskDuration; for example if this is set to `PT1H`, the taskDuration is set to 
`PT1H` and the supervisor creates a task at *2016-01-01T12:00Z*. Messages with 
timestamps later than *2016-01-01T14:00Z* will be dropped. **Note:** Tasks 
sometimes run past their task duration, for example, in cases of supervisor 
failover. Setting `earlyMessageRejec [...]
 |`recordsPerFetch`|Integer|The number of records to request per call to fetch 
records from Kinesis. See [Determining fetch 
settings](#determining-fetch-settings).|no (see [Determining fetch 
settings](#determining-fetch-settings) for defaults)|
 |`fetchDelayMillis`|Integer|Time in milliseconds to wait between subsequent 
calls to fetch records from Kinesis. See [Determining fetch 
settings](#determining-fetch-settings).|no (default == 0)|
@@ -173,6 +171,7 @@ The Kinesis indexing service reports lag metrics measured 
in time milliseconds r
 | `scaleOutStep` | Number of tasks to add at a time when scaling out | no 
(default == 2) |
 
 The following example demonstrates a supervisor spec with `lagBased` 
autoScaler enabled:
+
 ```json
 {
   "type": "kinesis",
@@ -255,7 +254,8 @@ The following example demonstrates a supervisor spec with 
`lagBased` autoScaler
 Kinesis indexing service supports both 
[`inputFormat`](../../ingestion/data-formats.md#input-format) and 
[`parser`](../../ingestion/data-formats.md#parser) to specify the data format.
 Use the `inputFormat` to specify the data format for Kinesis indexing service 
unless you need a format only supported by the legacy `parser`.
 
-Supported `inputFormat`s include:
+Supported values for `inputFormat` include:
+
 - `csv`
 - `delimited`
 - `json`
@@ -284,10 +284,10 @@ The `tuningConfig` is optional. If no `tuningConfig` is 
specified, default param
 |`indexSpecForIntermediatePersists`|Object|Defines segment storage format 
options to be used at indexing time for intermediate persisted temporary 
segments. This can be used to disable dimension/metric compression on 
intermediate segments to reduce memory required for final merging. However, 
disabling compression on intermediate segments might increase page cache use 
while they are used before getting merged into final segment published, see 
[IndexSpec](#indexspec) for possible values.|  [...]
 |`reportParseExceptions`|Boolean|If true, exceptions encountered during 
parsing will be thrown and will halt ingestion; if false, unparseable rows and 
fields will be skipped.|no (default == false)|
 |`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It 
must be >= 0, where 0 means to wait forever.| no (default == 0)|
-|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read 
Kinesis messages that are no longer available.<br/><br/>If false, the exception 
will bubble up, which will cause your tasks to fail and ingestion to halt. If 
this occurs, manual intervention is required to correct the situation; 
potentially using the [Reset Supervisor 
API](../../api-reference/api-reference.md#supervisors). This mode is useful for 
production, since it will make you aware of issues with ingestio [...]
+|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read 
Kinesis messages that are no longer available.<br/><br/>If false, the exception 
bubbles up, causing tasks to fail and ingestion to halt. If this occurs, manual 
intervention is required to correct the situation, potentially using the [Reset 
Supervisor API](../../api-reference/api-reference.md#supervisors). This mode is 
useful for production, since it highlights issues with ingestion.<br/><br/>If 
true, Druid aut [...]
 |`skipSequenceNumberAvailabilityCheck`|Boolean|Whether to enable checking if 
the current sequence number is still available in a particular Kinesis shard. 
If set to false, the indexing task will attempt to reset the current sequence 
number (or not), depending on the value of `resetOffsetAutomatically`.|no 
(default == false)|
 |`workerThreads`|Integer|The number of threads that the supervisor uses to 
handle requests/responses for worker tasks, along with any other internal 
asynchronous operation.|no (default == min(10, taskCount))|
-|`chatAsync`|Boolean| If true, use asynchronous communication with indexing 
tasks, and ignore the `chatThreads` parameter. If false, use synchronous 
communication in a thread pool of size `chatThreads`.                           
                                                                                
                                                                                
                                                                                
                        [...]
+|`chatAsync`|Boolean| If true, the supervisor uses asynchronous communication 
with indexing tasks and ignores the `chatThreads` parameter. If false, the 
supervisor uses synchronous communication in a thread pool of size 
`chatThreads`.| no (default == true)|
 |`chatThreads`|Integer| The number of threads that will be used for 
communicating with indexing tasks. Ignored if `chatAsync` is `true` (the 
default).| no (default == min(10, taskCount * replicas))|
 |`chatRetries`|Integer|The number of times HTTP requests to indexing tasks 
will be retried before considering tasks unresponsive.| no (default == 8)|
 |`httpTimeout`|ISO8601 Period|How long to wait for a HTTP response from an 
indexing task.|no (default == PT10S)|
@@ -295,15 +295,15 @@ The `tuningConfig` is optional. If no `tuningConfig` is 
specified, default param
 |`recordBufferSize`|Integer|Size of the buffer (number of events) used between 
the Kinesis fetch threads and the main ingestion thread.|no (see [Determining 
fetch settings](#determining-fetch-settings) for defaults)|
 |`recordBufferOfferTimeout`|Integer|Length of time in milliseconds to wait for 
space to become available in the buffer before timing out.| no (default == 
5000)|
 |`recordBufferFullWait`|Integer|Length of time in milliseconds to wait for the 
buffer to drain before attempting to fetch records from Kinesis again.|no 
(default == 5000)|
-|`fetchThreads`|Integer|Size of the pool of threads fetching data from 
Kinesis. There is no benefit in having more threads than Kinesis shards.|no 
(default == procs * 2, where "procs" is the number of processors available to 
the task)                                                                  |
+|`fetchThreads`|Integer|Size of the pool of threads fetching data from 
Kinesis. There is no benefit in having more threads than Kinesis shards.|no 
(default == procs * 2, where `procs` is the number of processors available to 
the task)|
 |`segmentWriteOutMediumFactory`|Object|Segment write-out medium to use when 
creating segments. See below for more information.|no (not specified by 
default, the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` 
is used)|
 |`intermediateHandoffPeriod`|ISO8601 Period|How often the tasks should hand 
off segments. Handoff will happen either if `maxRowsPerSegment` or 
`maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens 
earlier.| no (default == P2147483647D)|
 |`logParseExceptions`|Boolean|If true, log an error message when a parsing 
exception occurs, containing information about the row where the error 
occurred.|no, default == false|
 |`maxParseExceptions`|Integer|The maximum number of parse exceptions that can 
occur before the task halts ingestion and fails. Overridden if 
`reportParseExceptions` is set.|no, unlimited default|
 |`maxSavedParseExceptions`|Integer|When a parse exception occurs, Druid can 
keep track of the most recent parse exceptions. "maxSavedParseExceptions" 
limits how many exception instances will be saved. These saved exceptions will 
be made available after the task finishes in the [task completion 
report](../../ingestion/tasks.md#task-reports). Overridden if 
`reportParseExceptions` is set.|no, default == 0|
 |`maxRecordsPerPoll`|Integer|The maximum number of records/events to be 
fetched from buffer per poll. The actual maximum will be 
`Max(maxRecordsPerPoll, Max(bufferSize, 1))`|no (see [Determining fetch 
settings](#determining-fetch-settings) for defaults)|
-|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or 
merged, the supervisor will recompute shard -> task group mappings, and signal 
any running tasks created under the old mappings to stop early at (current time 
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to 
begin reading from the new shards more quickly. The repartition transition wait 
time controlled by this property gives the stream additional time to write 
records to the new shards af [...]
-|`offsetFetchPeriod`|ISO8601 Period|How often the supervisor queries Kinesis 
and the indexing tasks to fetch current offsets and calculate lag. If the 
user-specified value is below the minimum value (`PT5S`), the supervisor 
ignores the value and uses the minimum value instead.|no (default == PT30S, min 
== PT5S)|
+|`repartitionTransitionDuration`|ISO8601 period|When shards are split or 
merged, the supervisor recomputes shard to task group mappings. The supervisor 
also signals any running tasks created under the old mappings to stop early at 
(current time + `repartitionTransitionDuration`). Stopping the tasks early 
allows Druid to begin reading from the new shards more quickly. The repartition 
transition wait time controlled by this property gives the stream additional 
time to write records to the  [...]
+|`offsetFetchPeriod`|ISO8601 period|How often the supervisor queries Kinesis 
and the indexing tasks to fetch current offsets and calculate lag. If the 
user-specified value is below the minimum value (`PT5S`), the supervisor 
ignores the value and uses the minimum value instead.|no (default == PT30S, min 
== PT5S)|
 |`useListShards`|Boolean|Indicates if `listShards` API of AWS Kinesis SDK can 
be used to prevent `LimitExceededException` during ingestion. Please note that 
the necessary `IAM` permissions must be set for this to work.|no (default == 
false)|
 
 #### IndexSpec
@@ -343,7 +343,8 @@ For all supervisor APIs, check [Supervisor 
APIs](../../api-reference/api-referen
 ### AWS Authentication
 
 To authenticate with AWS, you must provide your AWS access key and AWS secret 
key via `runtime.properties`, for example:
-```
+
+```text
 -Ddruid.kinesis.accessKey=123 -Ddruid.kinesis.secretKey=456
 ```
 
@@ -352,18 +353,18 @@ look for credentials set in environment variables, via 
[Web Identity Token](http
 profile provider (in this order).
 
 To ingest data from Kinesis, ensure that the policy attached to your IAM role 
contains the necessary permissions.
-The permissions needed depend on the value of `useListShards`.
+The required permissions depend on the value of `useListShards`.
 
 If the `useListShards` flag is set to `true`, you need following permissions:
 
-* `ListStreams`: required to list your data streams
-* `Get*`: required for `GetShardIterator`
-* `GetRecords`: required to get data records from a data stream's shard
-* `ListShards` : required to get the shards for a stream of interest
+- `ListStreams`: required to list your data streams
+- `Get*`: required for `GetShardIterator`
+- `GetRecords`: required to get data records from a data stream's shard
+- `ListShards` : required to get the shards for a stream of interest
 
 **Example policy**
 
-```
+```json
 [
   {
     "Effect": "Allow",
@@ -380,14 +381,14 @@ If the `useListShards` flag is set to `true`, you need 
following permissions:
 
 If the `useListShards` flag is set to `false`, you need following permissions:
 
-* `ListStreams`: required to list your data streams
-* `Get*`: required for `GetShardIterator`
-* `GetRecords`: required to get data records from a data stream's shard
-* `DescribeStream`: required to describe the specified data stream
+- `ListStreams`: required to list your data streams
+- `Get*`: required for `GetShardIterator`
+- `GetRecords`: required to get data records from a data stream's shard
+- `DescribeStream`: required to describe the specified data stream
 
 **Example policy**
 
-```    
+```json
 [
   {
     "Effect": "Allow",
@@ -416,7 +417,7 @@ Indexing Service, Kinesis reports lag metrics measured in 
time difference in mil
 The status report also contains the supervisor's state and a list of recently 
thrown exceptions (reported as
 `recentErrors`, whose max size can be controlled using the 
`druid.supervisor.maxStoredExceptionEvents` configuration).
 There are two fields related to the supervisor's state - `state` and 
`detailedState`. The `state` field will always be
-one of a small number of generic states that are applicable to any type of 
supervisor, while the `detailedState` field
+one of a small number of generic states that apply to any type of supervisor, 
while the `detailedState` field
 will contain a more descriptive, implementation-specific state that may 
provide more insight into the supervisor's
 activities than the generic `state` field.
 
@@ -439,6 +440,7 @@ The list of `detailedState` values and their corresponding 
`state` mapping is as
 |STOPPING|STOPPING|The supervisor is stopping|
 
 On each iteration of the supervisor's run loop, the supervisor completes the 
following tasks in sequence:
+
   1) Fetch the list of shards from Kinesis and determine the starting sequence 
number for each shard (either based on the
   last processed sequence number if continuing, or starting from the beginning 
or ending of the stream if this is a new stream).
   2) Discover any running indexing tasks that are writing to the supervisor's 
datasource and adopt them if they match
@@ -477,25 +479,25 @@ it will just ensure that no indexing tasks are running 
until the supervisor is r
 
 ### Resetting Supervisors
 
-The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears 
stored 
-sequence numbers, causing the supervisor to start reading from either the 
earliest or 
-latest sequence numbers in Kinesis (depending on the value of 
`useEarliestSequenceNumber`). 
-After clearing stored sequence numbers, the supervisor kills and recreates 
active tasks, 
-so that tasks begin reading from valid sequence numbers. 
+The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears 
stored
+sequence numbers, causing the supervisor to start reading from either the 
earliest or
+latest sequence numbers in Kinesis (depending on the value of 
`useEarliestSequenceNumber`).
+After clearing stored sequence numbers, the supervisor kills and recreates 
active tasks,
+so that tasks begin reading from valid sequence numbers.
 
-Use care when using this operation! Resetting the supervisor may cause Kinesis 
messages 
-to be skipped or read twice, resulting in missing or duplicate data. 
+Use care when using this operation! Resetting the supervisor may cause Kinesis 
messages
+to be skipped or read twice, resulting in missing or duplicate data.
 
-The reason for using this operation is to recover from a state in which the 
supervisor 
-ceases operating due to missing sequence numbers. The indexing service keeps 
track of the latest 
-persisted sequence number in order to provide exactly-once ingestion 
guarantees across 
-tasks. 
+The reason for using this operation is to recover from a state in which the 
supervisor
+ceases operating due to missing sequence numbers. The indexing service keeps 
track of the latest
+persisted sequence number to provide exactly-once ingestion guarantees across
+tasks.
 
-Subsequent tasks must start reading from where the previous task completed in 
-order for the generated segments to be accepted. If the messages at the 
expected starting sequence numbers are 
-no longer available in Kinesis (typically because the message retention period 
has elapsed or the topic was 
-removed and re-created) the supervisor will refuse to start and in-flight 
tasks will fail. This operation 
-enables you to recover from this condition. 
+Subsequent tasks must start reading from where the previous task completed
+for the generated segments to be accepted. If the messages at the expected 
starting sequence numbers are
+no longer available in Kinesis (typically because the message retention period 
has elapsed or the topic was
+removed and re-created) the supervisor will refuse to start and in-flight 
tasks will fail. This operation
+enables you to recover from this condition.
 
 Note that the supervisor must be running for this endpoint to be available.
 
@@ -514,7 +516,7 @@ Kinesis indexing tasks run on MiddleManagers and are thus 
limited by the resourc
 cluster. In particular, you should make sure that you have sufficient worker 
capacity (configured using the
 `druid.worker.capacity` property) to handle the configuration in the 
supervisor spec. Note that worker capacity is
 shared across all types of indexing tasks, so you should plan your worker 
capacity to handle your total indexing load
-(e.g. batch processing, realtime tasks, merging tasks, etc.). If your workers 
run out of capacity, Kinesis indexing tasks
+(e.g. batch processing, streaming tasks, merging tasks, etc.). If your workers 
run out of capacity, Kinesis indexing tasks
 will queue and wait for the next available worker. This may cause queries to 
return partial results but will not result
 in data loss (assuming the tasks run before Kinesis purges those sequence 
numbers).
 
@@ -526,10 +528,10 @@ as it takes to generate segments, push segments to deep 
storage, and have them b
 The number of reading tasks is controlled by `replicas` and `taskCount`. In 
general, there will be `replicas * taskCount`
 reading tasks, the exception being if taskCount > {numKinesisShards} in which 
case {numKinesisShards} tasks will
 be used instead. When `taskDuration` elapses, these tasks will transition to 
publishing state and `replicas * taskCount`
-new reading tasks will be created. Therefore to allow for reading tasks and 
publishing tasks to run concurrently, there
+new reading tasks will be created. Therefore, to allow for reading tasks and 
publishing tasks to run concurrently, there
 should be a minimum capacity of:
 
-```
+```text
 workerCapacity = 2 * replicas * taskCount
 ```
 
@@ -555,8 +557,8 @@ fail-overs.
 A supervisor is stopped via the `POST 
/druid/indexer/v1/supervisor/<supervisorId>/terminate` endpoint. This places a
 tombstone marker in the database (to prevent the supervisor from being 
reloaded on a restart) and then gracefully
 shuts down the currently running supervisor. When a supervisor is shut down in 
this way, it will instruct its
-managed tasks to stop reading and begin publishing their segments immediately. 
The call to the shutdown endpoint will
-return after all tasks have been signalled to stop but before the tasks finish 
publishing their segments.
+managed tasks to stop reading. The tasks will begin publishing their segments 
immediately. The call to the shutdown
+endpoint will return after all tasks have been signalled to stop but before 
the tasks finish publishing their segments.
 
 ### Schema/Configuration Changes
 
@@ -572,22 +574,23 @@ In this way, configuration changes can be applied without 
requiring any pause in
 #### On the Subject of Segments
 
 Each Kinesis Indexing Task puts events consumed from Kinesis Shards assigned 
to it in a single segment for each segment
-granular interval until maxRowsPerSegment, maxTotalRows or 
intermediateHandoffPeriod limit is reached, at this point a new shard
+granular interval until maxRowsPerSegment, maxTotalRows or 
intermediateHandoffPeriod limit is reached. At this point, a new shard
 for this segment granularity is created for further events. Kinesis Indexing 
Task also does incremental hand-offs which
 means that all the segments created by a task will not be held up till the 
task duration is over. As soon as maxRowsPerSegment,
 maxTotalRows or intermediateHandoffPeriod limit is hit, all the segments held 
by the task at that point in time will be handed-off
 and new set of segments will be created for further events. This means that 
the task can run for longer durations of time
-without accumulating old segments locally on Middle Manager processes and it 
is encouraged to do so.
+without accumulating old segments locally on Middle Manager processes, and it 
is encouraged to do so.
 
-Kinesis Indexing Service may still produce some small segments. Lets say the 
task duration is 4 hours, segment granularity
-is set to an HOUR and Supervisor was started at 9:10 then after 4 hours at 
13:10, new set of tasks will be started and
-events for the interval 13:00 - 14:00 may be split across previous and new set 
of tasks. If you see it becoming a problem then
+Kinesis Indexing Service may still produce some small segments. Let's say the 
task duration is 4 hours, segment granularity
+is set to an HOUR and Supervisor was started at 9:10. Then after 4 hours at 
13:10, the new set of tasks will be started and
+events for the interval 13:00 - 14:00 may be split across the previous and the 
new set of tasks. If you see it becoming a problem then
 one can schedule re-indexing tasks be run to merge segments together into new 
segments of an ideal size (in the range of ~500-700 MB per segment).
 Details on how to optimize the segment size can be found on [Segment size 
optimization](../../operations/segment-optimization.md).
 There is also ongoing work to support automatic segment compaction of sharded 
segments as well as compaction not requiring
 Hadoop (see [here](https://github.com/apache/druid/pull/5102)).
 
 ### Determining Fetch Settings
+
 Kinesis indexing tasks fetch records using `fetchThreads` threads.
 If `fetchThreads` is higher than the number of Kinesis shards, the excess 
threads are unused.
 Each fetch thread fetches up to `recordsPerFetch` records at once from a 
Kinesis shard, with a delay between fetches
@@ -621,11 +624,13 @@ If the above limits are exceeded, Kinesis throws 
ProvisionedThroughputExceededEx
 Kinesis tasks pause by `fetchDelayMillis` or 3 seconds, whichever is larger, 
and then attempt the call again.
 
 In most cases, the default settings for fetch parameters are sufficient to 
achieve good performance without excessive
-memory usage. However, in some cases, you may need to adjust these parameters 
in order to more finely control fetch rate
-and memory usage. Optimal values depend on the average size of a record and 
the number of consumers you have reading
-from a given shard, which will be `replicas` unless you have other consumers 
also reading from this Kinesis stream.
+memory usage. However, in some cases, you may need to adjust these parameters 
to control fetch rate
+and memory usage more finely. Optimal values depend on the average size of a 
record and the number of consumers you
+have reading from a given shard, which will be `replicas` unless you have 
other consumers also reading from this
+Kinesis stream.
 
 ## Deaggregation
+
 The Kinesis indexing service supports de-aggregation of multiple rows packed 
into a single record by the Kinesis
 Producer Library's aggregate method for more efficient data transfer.
 
@@ -635,10 +640,11 @@ To enable this feature, set `deaggregate` to true in your 
`ioConfig` when submit
 
 When changing the shard count for a Kinesis stream, there will be a window of 
time around the resharding operation with early shutdown of Kinesis ingestion 
tasks and possible task failures.
 
-The early shutdowns and task failures are expected, and they occur because the 
supervisor will update the shard -> task group mappings as shards are closed 
and fully read, to ensure that tasks are not running 
-with an assignment of closed shards that have been fully read and to ensure a 
balanced distribution of active shards across tasks. 
+The early shutdowns and task failures are expected. They occur because the 
supervisor updates the shard to task group mappings as shards are closed and 
fully read. This ensures that tasks are not running
+with an assignment of closed shards that have been fully read and balances 
distribution of active shards across tasks.
 
 This window with early task shutdowns and possible task failures will conclude 
when:
+
 - All closed shards have been fully read and the Kinesis ingestion tasks have 
published the data from those shards, committing the "closed" state to metadata 
storage
 - Any remaining tasks that had inactive shards in the assignment have been 
shutdown (these tasks would have been created before the closed shards were 
completely drained)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch master updated: Docs: Typo and language cleanup in Kinesis ingestion docs (#14356)

Reply via email to