This is an automated email from the ASF dual-hosted git repository.
kfaraz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 55effd92cf Docs: Typo and language cleanup in Kinesis ingestion docs
(#14356)
55effd92cf is described below
commit 55effd92cfe80c23e1005e70a86da585764a7771
Author: Andreas Maechler <[email protected]>
AuthorDate: Thu Jun 1 20:48:41 2023 -0600
Docs: Typo and language cleanup in Kinesis ingestion docs (#14356)
Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Kashif Faraz <[email protected]>
Co-authored-by: Victoria Lim <[email protected]>
---
.../extensions-core/kinesis-ingestion.md | 120 +++++++++++----------
1 file changed, 63 insertions(+), 57 deletions(-)
diff --git a/docs/development/extensions-core/kinesis-ingestion.md
b/docs/development/extensions-core/kinesis-ingestion.md
index 046ffd2ad6..36e65f538c 100644
--- a/docs/development/extensions-core/kinesis-ingestion.md
+++ b/docs/development/extensions-core/kinesis-ingestion.md
@@ -24,10 +24,10 @@ sidebar_label: "Amazon Kinesis"
-->
When you enable the Kinesis indexing service, you can configure *supervisors*
on the Overlord to manage the creation and lifetime of Kinesis indexing tasks.
These indexing tasks read events using Kinesis' own shard and sequence number
mechanism to guarantee exactly-once ingestion. The supervisor oversees the
state of the indexing tasks to:
- - coordinate handoffs
- - manage failures
- - ensure that scalability and replication requirements are maintained.
+- coordinate handoffs
+- manage failures
+- ensure that scalability and replication requirements are maintained.
To use the Kinesis indexing service, load the `druid-kinesis-indexing-service`
core Apache Druid extension (see
[Including Extensions](../../configuration/extensions.md#loading-extensions)).
@@ -38,12 +38,11 @@ To use the Kinesis indexing service, load the
`druid-kinesis-indexing-service` c
To use the Kinesis indexing service, load the `druid-kinesis-indexing-service`
extension on both the Overlord and the MiddleManagers. Druid starts a
supervisor for a dataSource when you submit a supervisor spec. Submit your
supervisor spec to the following endpoint:
-
`http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor`
For example:
-```
+```sh
curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json
http://localhost:8090/druid/indexer/v1/supervisor
```
@@ -111,7 +110,6 @@ Where the file `supervisor-spec.json` contains a Kinesis
supervisor spec:
}
```
-
## Supervisor Spec
|Field|Description|Required|
@@ -136,7 +134,7 @@ Where the file `supervisor-spec.json` contains a Kinesis
supervisor spec:
|`period`|ISO8601 Period|How often the supervisor will execute its management
logic. Note that the supervisor will also run in response to certain events
(such as tasks succeeding, failing, and reaching their taskDuration) so this
value specifies the maximum time between iterations.|no (default == PT30S)|
|`useEarliestSequenceNumber`|Boolean|If a supervisor is managing a dataSource
for the first time, it will obtain a set of starting sequence numbers from
Kinesis. This flag determines whether it retrieves the earliest or latest
sequence numbers in Kinesis. Under normal circumstances, subsequent tasks will
start from where the previous segments ended so this flag will only be used on
first run.|no (default == false)|
|`completionTimeout`|ISO8601 Period|The length of time to wait before
declaring a publishing task as failed and terminating it. If this is set too
low, your tasks may never publish. The publishing clock for a task begins
roughly after `taskDuration` elapses.|no (default == PT6H)|
-|`lateMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject
messages with timestamps earlier than this period before the task was created;
for example if this is set to `PT1H` and the supervisor creates a task at
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z*
will be dropped. This may help prevent concurrency issues if your data stream
has late messages and you have multiple pipelines that need to operate on the
same segments (e.g. a realtime an [...]
+|`lateMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject
messages with timestamps earlier than this period before the task was created;
for example if this is set to `PT1H` and the supervisor creates a task at
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z*
will be dropped. This may help prevent concurrency issues if your data stream
has late messages and you have multiple pipelines that need to operate on the
same segments (e.g. a streaming a [...]
|`earlyMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject
messages with timestamps later than this period after the task reached its
taskDuration; for example if this is set to `PT1H`, the taskDuration is set to
`PT1H` and the supervisor creates a task at *2016-01-01T12:00Z*. Messages with
timestamps later than *2016-01-01T14:00Z* will be dropped. **Note:** Tasks
sometimes run past their task duration, for example, in cases of supervisor
failover. Setting `earlyMessageRejec [...]
|`recordsPerFetch`|Integer|The number of records to request per call to fetch
records from Kinesis. See [Determining fetch
settings](#determining-fetch-settings).|no (see [Determining fetch
settings](#determining-fetch-settings) for defaults)|
|`fetchDelayMillis`|Integer|Time in milliseconds to wait between subsequent
calls to fetch records from Kinesis. See [Determining fetch
settings](#determining-fetch-settings).|no (default == 0)|
@@ -173,6 +171,7 @@ The Kinesis indexing service reports lag metrics measured
in time milliseconds r
| `scaleOutStep` | Number of tasks to add at a time when scaling out | no
(default == 2) |
The following example demonstrates a supervisor spec with `lagBased`
autoScaler enabled:
+
```json
{
"type": "kinesis",
@@ -255,7 +254,8 @@ The following example demonstrates a supervisor spec with
`lagBased` autoScaler
Kinesis indexing service supports both
[`inputFormat`](../../ingestion/data-formats.md#input-format) and
[`parser`](../../ingestion/data-formats.md#parser) to specify the data format.
Use the `inputFormat` to specify the data format for Kinesis indexing service
unless you need a format only supported by the legacy `parser`.
-Supported `inputFormat`s include:
+Supported values for `inputFormat` include:
+
- `csv`
- `delimited`
- `json`
@@ -284,10 +284,10 @@ The `tuningConfig` is optional. If no `tuningConfig` is
specified, default param
|`indexSpecForIntermediatePersists`|Object|Defines segment storage format
options to be used at indexing time for intermediate persisted temporary
segments. This can be used to disable dimension/metric compression on
intermediate segments to reduce memory required for final merging. However,
disabling compression on intermediate segments might increase page cache use
while they are used before getting merged into final segment published, see
[IndexSpec](#indexspec) for possible values.| [...]
|`reportParseExceptions`|Boolean|If true, exceptions encountered during
parsing will be thrown and will halt ingestion; if false, unparseable rows and
fields will be skipped.|no (default == false)|
|`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It
must be >= 0, where 0 means to wait forever.| no (default == 0)|
-|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read
Kinesis messages that are no longer available.<br/><br/>If false, the exception
will bubble up, which will cause your tasks to fail and ingestion to halt. If
this occurs, manual intervention is required to correct the situation;
potentially using the [Reset Supervisor
API](../../api-reference/api-reference.md#supervisors). This mode is useful for
production, since it will make you aware of issues with ingestio [...]
+|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read
Kinesis messages that are no longer available.<br/><br/>If false, the exception
bubbles up, causing tasks to fail and ingestion to halt. If this occurs, manual
intervention is required to correct the situation, potentially using the [Reset
Supervisor API](../../api-reference/api-reference.md#supervisors). This mode is
useful for production, since it highlights issues with ingestion.<br/><br/>If
true, Druid aut [...]
|`skipSequenceNumberAvailabilityCheck`|Boolean|Whether to enable checking if
the current sequence number is still available in a particular Kinesis shard.
If set to false, the indexing task will attempt to reset the current sequence
number (or not), depending on the value of `resetOffsetAutomatically`.|no
(default == false)|
|`workerThreads`|Integer|The number of threads that the supervisor uses to
handle requests/responses for worker tasks, along with any other internal
asynchronous operation.|no (default == min(10, taskCount))|
-|`chatAsync`|Boolean| If true, use asynchronous communication with indexing
tasks, and ignore the `chatThreads` parameter. If false, use synchronous
communication in a thread pool of size `chatThreads`.
[...]
+|`chatAsync`|Boolean| If true, the supervisor uses asynchronous communication
with indexing tasks and ignores the `chatThreads` parameter. If false, the
supervisor uses synchronous communication in a thread pool of size
`chatThreads`.| no (default == true)|
|`chatThreads`|Integer| The number of threads that will be used for
communicating with indexing tasks. Ignored if `chatAsync` is `true` (the
default).| no (default == min(10, taskCount * replicas))|
|`chatRetries`|Integer|The number of times HTTP requests to indexing tasks
will be retried before considering tasks unresponsive.| no (default == 8)|
|`httpTimeout`|ISO8601 Period|How long to wait for a HTTP response from an
indexing task.|no (default == PT10S)|
@@ -295,15 +295,15 @@ The `tuningConfig` is optional. If no `tuningConfig` is
specified, default param
|`recordBufferSize`|Integer|Size of the buffer (number of events) used between
the Kinesis fetch threads and the main ingestion thread.|no (see [Determining
fetch settings](#determining-fetch-settings) for defaults)|
|`recordBufferOfferTimeout`|Integer|Length of time in milliseconds to wait for
space to become available in the buffer before timing out.| no (default ==
5000)|
|`recordBufferFullWait`|Integer|Length of time in milliseconds to wait for the
buffer to drain before attempting to fetch records from Kinesis again.|no
(default == 5000)|
-|`fetchThreads`|Integer|Size of the pool of threads fetching data from
Kinesis. There is no benefit in having more threads than Kinesis shards.|no
(default == procs * 2, where "procs" is the number of processors available to
the task) |
+|`fetchThreads`|Integer|Size of the pool of threads fetching data from
Kinesis. There is no benefit in having more threads than Kinesis shards.|no
(default == procs * 2, where `procs` is the number of processors available to
the task)|
|`segmentWriteOutMediumFactory`|Object|Segment write-out medium to use when
creating segments. See below for more information.|no (not specified by
default, the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type`
is used)|
|`intermediateHandoffPeriod`|ISO8601 Period|How often the tasks should hand
off segments. Handoff will happen either if `maxRowsPerSegment` or
`maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens
earlier.| no (default == P2147483647D)|
|`logParseExceptions`|Boolean|If true, log an error message when a parsing
exception occurs, containing information about the row where the error
occurred.|no, default == false|
|`maxParseExceptions`|Integer|The maximum number of parse exceptions that can
occur before the task halts ingestion and fails. Overridden if
`reportParseExceptions` is set.|no, unlimited default|
|`maxSavedParseExceptions`|Integer|When a parse exception occurs, Druid can
keep track of the most recent parse exceptions. "maxSavedParseExceptions"
limits how many exception instances will be saved. These saved exceptions will
be made available after the task finishes in the [task completion
report](../../ingestion/tasks.md#task-reports). Overridden if
`reportParseExceptions` is set.|no, default == 0|
|`maxRecordsPerPoll`|Integer|The maximum number of records/events to be
fetched from buffer per poll. The actual maximum will be
`Max(maxRecordsPerPoll, Max(bufferSize, 1))`|no (see [Determining fetch
settings](#determining-fetch-settings) for defaults)|
-|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or
merged, the supervisor will recompute shard -> task group mappings, and signal
any running tasks created under the old mappings to stop early at (current time
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to
begin reading from the new shards more quickly. The repartition transition wait
time controlled by this property gives the stream additional time to write
records to the new shards af [...]
-|`offsetFetchPeriod`|ISO8601 Period|How often the supervisor queries Kinesis
and the indexing tasks to fetch current offsets and calculate lag. If the
user-specified value is below the minimum value (`PT5S`), the supervisor
ignores the value and uses the minimum value instead.|no (default == PT30S, min
== PT5S)|
+|`repartitionTransitionDuration`|ISO8601 period|When shards are split or
merged, the supervisor recomputes shard to task group mappings. The supervisor
also signals any running tasks created under the old mappings to stop early at
(current time + `repartitionTransitionDuration`). Stopping the tasks early
allows Druid to begin reading from the new shards more quickly. The repartition
transition wait time controlled by this property gives the stream additional
time to write records to the [...]
+|`offsetFetchPeriod`|ISO8601 period|How often the supervisor queries Kinesis
and the indexing tasks to fetch current offsets and calculate lag. If the
user-specified value is below the minimum value (`PT5S`), the supervisor
ignores the value and uses the minimum value instead.|no (default == PT30S, min
== PT5S)|
|`useListShards`|Boolean|Indicates if `listShards` API of AWS Kinesis SDK can
be used to prevent `LimitExceededException` during ingestion. Please note that
the necessary `IAM` permissions must be set for this to work.|no (default ==
false)|
#### IndexSpec
@@ -343,7 +343,8 @@ For all supervisor APIs, check [Supervisor
APIs](../../api-reference/api-referen
### AWS Authentication
To authenticate with AWS, you must provide your AWS access key and AWS secret
key via `runtime.properties`, for example:
-```
+
+```text
-Ddruid.kinesis.accessKey=123 -Ddruid.kinesis.secretKey=456
```
@@ -352,18 +353,18 @@ look for credentials set in environment variables, via
[Web Identity Token](http
profile provider (in this order).
To ingest data from Kinesis, ensure that the policy attached to your IAM role
contains the necessary permissions.
-The permissions needed depend on the value of `useListShards`.
+The required permissions depend on the value of `useListShards`.
If the `useListShards` flag is set to `true`, you need following permissions:
-* `ListStreams`: required to list your data streams
-* `Get*`: required for `GetShardIterator`
-* `GetRecords`: required to get data records from a data stream's shard
-* `ListShards` : required to get the shards for a stream of interest
+- `ListStreams`: required to list your data streams
+- `Get*`: required for `GetShardIterator`
+- `GetRecords`: required to get data records from a data stream's shard
+- `ListShards` : required to get the shards for a stream of interest
**Example policy**
-```
+```json
[
{
"Effect": "Allow",
@@ -380,14 +381,14 @@ If the `useListShards` flag is set to `true`, you need
following permissions:
If the `useListShards` flag is set to `false`, you need following permissions:
-* `ListStreams`: required to list your data streams
-* `Get*`: required for `GetShardIterator`
-* `GetRecords`: required to get data records from a data stream's shard
-* `DescribeStream`: required to describe the specified data stream
+- `ListStreams`: required to list your data streams
+- `Get*`: required for `GetShardIterator`
+- `GetRecords`: required to get data records from a data stream's shard
+- `DescribeStream`: required to describe the specified data stream
**Example policy**
-```
+```json
[
{
"Effect": "Allow",
@@ -416,7 +417,7 @@ Indexing Service, Kinesis reports lag metrics measured in
time difference in mil
The status report also contains the supervisor's state and a list of recently
thrown exceptions (reported as
`recentErrors`, whose max size can be controlled using the
`druid.supervisor.maxStoredExceptionEvents` configuration).
There are two fields related to the supervisor's state - `state` and
`detailedState`. The `state` field will always be
-one of a small number of generic states that are applicable to any type of
supervisor, while the `detailedState` field
+one of a small number of generic states that apply to any type of supervisor,
while the `detailedState` field
will contain a more descriptive, implementation-specific state that may
provide more insight into the supervisor's
activities than the generic `state` field.
@@ -439,6 +440,7 @@ The list of `detailedState` values and their corresponding
`state` mapping is as
|STOPPING|STOPPING|The supervisor is stopping|
On each iteration of the supervisor's run loop, the supervisor completes the
following tasks in sequence:
+
1) Fetch the list of shards from Kinesis and determine the starting sequence
number for each shard (either based on the
last processed sequence number if continuing, or starting from the beginning
or ending of the stream if this is a new stream).
2) Discover any running indexing tasks that are writing to the supervisor's
datasource and adopt them if they match
@@ -477,25 +479,25 @@ it will just ensure that no indexing tasks are running
until the supervisor is r
### Resetting Supervisors
-The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears
stored
-sequence numbers, causing the supervisor to start reading from either the
earliest or
-latest sequence numbers in Kinesis (depending on the value of
`useEarliestSequenceNumber`).
-After clearing stored sequence numbers, the supervisor kills and recreates
active tasks,
-so that tasks begin reading from valid sequence numbers.
+The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears
stored
+sequence numbers, causing the supervisor to start reading from either the
earliest or
+latest sequence numbers in Kinesis (depending on the value of
`useEarliestSequenceNumber`).
+After clearing stored sequence numbers, the supervisor kills and recreates
active tasks,
+so that tasks begin reading from valid sequence numbers.
-Use care when using this operation! Resetting the supervisor may cause Kinesis
messages
-to be skipped or read twice, resulting in missing or duplicate data.
+Use care when using this operation! Resetting the supervisor may cause Kinesis
messages
+to be skipped or read twice, resulting in missing or duplicate data.
-The reason for using this operation is to recover from a state in which the
supervisor
-ceases operating due to missing sequence numbers. The indexing service keeps
track of the latest
-persisted sequence number in order to provide exactly-once ingestion
guarantees across
-tasks.
+The reason for using this operation is to recover from a state in which the
supervisor
+ceases operating due to missing sequence numbers. The indexing service keeps
track of the latest
+persisted sequence number to provide exactly-once ingestion guarantees across
+tasks.
-Subsequent tasks must start reading from where the previous task completed in
-order for the generated segments to be accepted. If the messages at the
expected starting sequence numbers are
-no longer available in Kinesis (typically because the message retention period
has elapsed or the topic was
-removed and re-created) the supervisor will refuse to start and in-flight
tasks will fail. This operation
-enables you to recover from this condition.
+Subsequent tasks must start reading from where the previous task completed
+for the generated segments to be accepted. If the messages at the expected
starting sequence numbers are
+no longer available in Kinesis (typically because the message retention period
has elapsed or the topic was
+removed and re-created) the supervisor will refuse to start and in-flight
tasks will fail. This operation
+enables you to recover from this condition.
Note that the supervisor must be running for this endpoint to be available.
@@ -514,7 +516,7 @@ Kinesis indexing tasks run on MiddleManagers and are thus
limited by the resourc
cluster. In particular, you should make sure that you have sufficient worker
capacity (configured using the
`druid.worker.capacity` property) to handle the configuration in the
supervisor spec. Note that worker capacity is
shared across all types of indexing tasks, so you should plan your worker
capacity to handle your total indexing load
-(e.g. batch processing, realtime tasks, merging tasks, etc.). If your workers
run out of capacity, Kinesis indexing tasks
+(e.g. batch processing, streaming tasks, merging tasks, etc.). If your workers
run out of capacity, Kinesis indexing tasks
will queue and wait for the next available worker. This may cause queries to
return partial results but will not result
in data loss (assuming the tasks run before Kinesis purges those sequence
numbers).
@@ -526,10 +528,10 @@ as it takes to generate segments, push segments to deep
storage, and have them b
The number of reading tasks is controlled by `replicas` and `taskCount`. In
general, there will be `replicas * taskCount`
reading tasks, the exception being if taskCount > {numKinesisShards} in which
case {numKinesisShards} tasks will
be used instead. When `taskDuration` elapses, these tasks will transition to
publishing state and `replicas * taskCount`
-new reading tasks will be created. Therefore to allow for reading tasks and
publishing tasks to run concurrently, there
+new reading tasks will be created. Therefore, to allow for reading tasks and
publishing tasks to run concurrently, there
should be a minimum capacity of:
-```
+```text
workerCapacity = 2 * replicas * taskCount
```
@@ -555,8 +557,8 @@ fail-overs.
A supervisor is stopped via the `POST
/druid/indexer/v1/supervisor/<supervisorId>/terminate` endpoint. This places a
tombstone marker in the database (to prevent the supervisor from being
reloaded on a restart) and then gracefully
shuts down the currently running supervisor. When a supervisor is shut down in
this way, it will instruct its
-managed tasks to stop reading and begin publishing their segments immediately.
The call to the shutdown endpoint will
-return after all tasks have been signalled to stop but before the tasks finish
publishing their segments.
+managed tasks to stop reading. The tasks will begin publishing their segments
immediately. The call to the shutdown
+endpoint will return after all tasks have been signalled to stop but before
the tasks finish publishing their segments.
### Schema/Configuration Changes
@@ -572,22 +574,23 @@ In this way, configuration changes can be applied without
requiring any pause in
#### On the Subject of Segments
Each Kinesis Indexing Task puts events consumed from Kinesis Shards assigned
to it in a single segment for each segment
-granular interval until maxRowsPerSegment, maxTotalRows or
intermediateHandoffPeriod limit is reached, at this point a new shard
+granular interval until maxRowsPerSegment, maxTotalRows or
intermediateHandoffPeriod limit is reached. At this point, a new shard
for this segment granularity is created for further events. Kinesis Indexing
Task also does incremental hand-offs which
means that all the segments created by a task will not be held up till the
task duration is over. As soon as maxRowsPerSegment,
maxTotalRows or intermediateHandoffPeriod limit is hit, all the segments held
by the task at that point in time will be handed-off
and new set of segments will be created for further events. This means that
the task can run for longer durations of time
-without accumulating old segments locally on Middle Manager processes and it
is encouraged to do so.
+without accumulating old segments locally on Middle Manager processes, and it
is encouraged to do so.
-Kinesis Indexing Service may still produce some small segments. Lets say the
task duration is 4 hours, segment granularity
-is set to an HOUR and Supervisor was started at 9:10 then after 4 hours at
13:10, new set of tasks will be started and
-events for the interval 13:00 - 14:00 may be split across previous and new set
of tasks. If you see it becoming a problem then
+Kinesis Indexing Service may still produce some small segments. Let's say the
task duration is 4 hours, segment granularity
+is set to an HOUR and Supervisor was started at 9:10. Then after 4 hours at
13:10, the new set of tasks will be started and
+events for the interval 13:00 - 14:00 may be split across the previous and the
new set of tasks. If you see it becoming a problem then
one can schedule re-indexing tasks be run to merge segments together into new
segments of an ideal size (in the range of ~500-700 MB per segment).
Details on how to optimize the segment size can be found on [Segment size
optimization](../../operations/segment-optimization.md).
There is also ongoing work to support automatic segment compaction of sharded
segments as well as compaction not requiring
Hadoop (see [here](https://github.com/apache/druid/pull/5102)).
### Determining Fetch Settings
+
Kinesis indexing tasks fetch records using `fetchThreads` threads.
If `fetchThreads` is higher than the number of Kinesis shards, the excess
threads are unused.
Each fetch thread fetches up to `recordsPerFetch` records at once from a
Kinesis shard, with a delay between fetches
@@ -621,11 +624,13 @@ If the above limits are exceeded, Kinesis throws
ProvisionedThroughputExceededEx
Kinesis tasks pause by `fetchDelayMillis` or 3 seconds, whichever is larger,
and then attempt the call again.
In most cases, the default settings for fetch parameters are sufficient to
achieve good performance without excessive
-memory usage. However, in some cases, you may need to adjust these parameters
in order to more finely control fetch rate
-and memory usage. Optimal values depend on the average size of a record and
the number of consumers you have reading
-from a given shard, which will be `replicas` unless you have other consumers
also reading from this Kinesis stream.
+memory usage. However, in some cases, you may need to adjust these parameters
to control fetch rate
+and memory usage more finely. Optimal values depend on the average size of a
record and the number of consumers you
+have reading from a given shard, which will be `replicas` unless you have
other consumers also reading from this
+Kinesis stream.
## Deaggregation
+
The Kinesis indexing service supports de-aggregation of multiple rows packed
into a single record by the Kinesis
Producer Library's aggregate method for more efficient data transfer.
@@ -635,10 +640,11 @@ To enable this feature, set `deaggregate` to true in your
`ioConfig` when submit
When changing the shard count for a Kinesis stream, there will be a window of
time around the resharding operation with early shutdown of Kinesis ingestion
tasks and possible task failures.
-The early shutdowns and task failures are expected, and they occur because the
supervisor will update the shard -> task group mappings as shards are closed
and fully read, to ensure that tasks are not running
-with an assignment of closed shards that have been fully read and to ensure a
balanced distribution of active shards across tasks.
+The early shutdowns and task failures are expected. They occur because the
supervisor updates the shard to task group mappings as shards are closed and
fully read. This ensures that tasks are not running
+with an assignment of closed shards that have been fully read and balances
distribution of active shards across tasks.
This window with early task shutdowns and possible task failures will conclude
when:
+
- All closed shards have been fully read and the Kinesis ingestion tasks have
published the data from those shards, committing the "closed" state to metadata
storage
- Any remaining tasks that had inactive shards in the assignment have been
shutdown (these tasks would have been created before the closed shards were
completely drained)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]