ektravel commented on code in PR #14356:
URL: https://github.com/apache/druid/pull/14356#discussion_r1213345897
##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -284,25 +284,25 @@ The `tuningConfig` is optional. If no `tuningConfig` is
specified, default param
|`indexSpecForIntermediatePersists`|Object|Defines segment storage format
options to be used at indexing time for intermediate persisted temporary
segments. This can be used to disable dimension/metric compression on
intermediate segments to reduce memory required for final merging. However,
disabling compression on intermediate segments might increase page cache use
while they are used before getting merged into final segment published, see
[IndexSpec](#indexspec) for possible values.| no (default = same as
`indexSpec`)|
|`reportParseExceptions`|Boolean|If true, exceptions encountered during
parsing will be thrown and will halt ingestion; if false, unparseable rows and
fields will be skipped.|no (default == false)|
|`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It
must be >= 0, where 0 means to wait forever.| no (default == 0)|
-|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read
Kinesis messages that are no longer available.<br/><br/>If false, the exception
will bubble up, which will cause your tasks to fail and ingestion to halt. If
this occurs, manual intervention is required to correct the situation;
potentially using the [Reset Supervisor
API](../../api-reference/api-reference.md#supervisors). This mode is useful for
production, since it will make you aware of issues with ingestion.<br/><br/>If
true, Druid will automatically reset to the earlier or latest sequence number
available in Kinesis, based on the value of the `useEarliestSequenceNumber`
property (earliest if true, latest if false). Please note that this can lead to
data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_
(if `useEarliestSequenceNumber` is true) without your knowledge. Messages will
be logged indicating that a reset has occurred, but ingestion will continue.
This mode is useful f
or non-production situations, since it will make Druid attempt to recover from
problems automatically, even if they lead to quiet dropping or duplicating of
data.|no (default == false)|
+|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read
Kinesis messages that are no longer available.<br/><br/>If false, the exception
will bubble up, which will cause your tasks to fail and ingestion to halt. If
this occurs, manual intervention is required to correct the situation,
potentially using the [Reset Supervisor
API](../../api-reference/api-reference.md#supervisors). This mode is useful for
production, since it will make you aware of issues with ingestion.<br/><br/>If
true, Druid will automatically reset to the earliest or latest sequence number
available in Kinesis, based on the value of the `useEarliestSequenceNumber`
property (earliest if true, latest if false). Please note that this can lead to
data being *DROPPED* (if `useEarliestSequenceNumber` is false) or *DUPLICATED*
(if `useEarliestSequenceNumber` is true) without your knowledge. Messages will
be logged indicating that a reset has occurred, but ingestion will continue.
This mode is useful
for non-production situations since it will make Druid attempt to recover from
problems automatically, even if they lead to quiet dropping or duplicating of
data.|no (default == false)|
|`skipSequenceNumberAvailabilityCheck`|Boolean|Whether to enable checking if
the current sequence number is still available in a particular Kinesis shard.
If set to false, the indexing task will attempt to reset the current sequence
number (or not), depending on the value of `resetOffsetAutomatically`.|no
(default == false)|
|`workerThreads`|Integer|The number of threads that the supervisor uses to
handle requests/responses for worker tasks, along with any other internal
asynchronous operation.|no (default == min(10, taskCount))|
-|`chatAsync`|Boolean| If true, use asynchronous communication with indexing
tasks, and ignore the `chatThreads` parameter. If false, use synchronous
communication in a thread pool of size `chatThreads`.
| no (default == true)
|
+|`chatAsync`|Boolean| If true, use asynchronous communication with indexing
tasks, and ignore the `chatThreads` parameter. If false, use synchronous
communication in a thread pool of size `chatThreads`.| no (default == true)|
|`chatThreads`|Integer| The number of threads that will be used for
communicating with indexing tasks. Ignored if `chatAsync` is `true` (the
default).| no (default == min(10, taskCount * replicas))|
|`chatRetries`|Integer|The number of times HTTP requests to indexing tasks
will be retried before considering tasks unresponsive.| no (default == 8)|
|`httpTimeout`|ISO8601 Period|How long to wait for a HTTP response from an
indexing task.|no (default == PT10S)|
|`shutdownTimeout`|ISO8601 Period|How long to wait for the supervisor to
attempt a graceful shutdown of tasks before exiting.|no (default == PT80S)|
|`recordBufferSize`|Integer|Size of the buffer (number of events) used between
the Kinesis fetch threads and the main ingestion thread.|no (see [Determining
fetch settings](#determining-fetch-settings) for defaults)|
|`recordBufferOfferTimeout`|Integer|Length of time in milliseconds to wait for
space to become available in the buffer before timing out.| no (default ==
5000)|
|`recordBufferFullWait`|Integer|Length of time in milliseconds to wait for the
buffer to drain before attempting to fetch records from Kinesis again.|no
(default == 5000)|
-|`fetchThreads`|Integer|Size of the pool of threads fetching data from
Kinesis. There is no benefit in having more threads than Kinesis shards.|no
(default == procs * 2, where "procs" is the number of processors available to
the task) |
+|`fetchThreads`|Integer|Size of the pool of threads fetching data from
Kinesis. There is no benefit in having more threads than Kinesis shards.|no
(default == procs * 2, where "procs" is the number of processors available to
the task)|
|`segmentWriteOutMediumFactory`|Object|Segment write-out medium to use when
creating segments. See below for more information.|no (not specified by
default, the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type`
is used)|
|`intermediateHandoffPeriod`|ISO8601 Period|How often the tasks should hand
off segments. Handoff will happen either if `maxRowsPerSegment` or
`maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens
earlier.| no (default == P2147483647D)|
|`logParseExceptions`|Boolean|If true, log an error message when a parsing
exception occurs, containing information about the row where the error
occurred.|no, default == false|
|`maxParseExceptions`|Integer|The maximum number of parse exceptions that can
occur before the task halts ingestion and fails. Overridden if
`reportParseExceptions` is set.|no, unlimited default|
|`maxSavedParseExceptions`|Integer|When a parse exception occurs, Druid can
keep track of the most recent parse exceptions. "maxSavedParseExceptions"
limits how many exception instances will be saved. These saved exceptions will
be made available after the task finishes in the [task completion
report](../../ingestion/tasks.md#task-reports). Overridden if
`reportParseExceptions` is set.|no, default == 0|
|`maxRecordsPerPoll`|Integer|The maximum number of records/events to be
fetched from buffer per poll. The actual maximum will be
`Max(maxRecordsPerPoll, Max(bufferSize, 1))`|no (see [Determining fetch
settings](#determining-fetch-settings) for defaults)|
-|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or
merged, the supervisor will recompute shard -> task group mappings, and signal
any running tasks created under the old mappings to stop early at (current time
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to
begin reading from the new shards more quickly. The repartition transition wait
time controlled by this property gives the stream additional time to write
records to the new shards after the split/merge, which helps avoid the issues
with empty shard handling described at
https://github.com/apache/druid/issues/7600.|no, (default == PT2M)|
+|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or
merged, the supervisor will recompute shard -> task group mappings, and signal
any running tasks created under the old mappings to stop early at (current time
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to
begin reading from the new shards more quickly. The repartition transition wait
time controlled by this property gives the stream additional time to write
records to the new shards after the split/merge, which helps avoid issues with
[empty shard handling](https://github.com/apache/druid/issues/7600).|no,
(default == PT2M)|
|`offsetFetchPeriod`|ISO8601 Period|How often the supervisor queries Kinesis
and the indexing tasks to fetch current offsets and calculate lag. If the
user-specified value is below the minimum value (`PT5S`), the supervisor
ignores the value and uses the minimum value instead.|no (default == PT30S, min
== PT5S)|
Review Comment:
```suggestion
|`offsetFetchPeriod`|ISO8601 period|How often the supervisor queries Kinesis
and the indexing tasks to fetch current offsets and calculate lag. If the
user-specified value is below the minimum value (`PT5S`), the supervisor
ignores the value and uses the minimum value instead.|no (default == PT30S, min
== PT5S)|
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]