[GitHub] [druid] ektravel commented on a diff in pull request #14356: Minor cleanups from working on sampling bugfix

via GitHub Thu, 01 Jun 2023 08:38:10 -0700


ektravel commented on code in PR #14356:
URL: https://github.com/apache/druid/pull/14356#discussion_r1213345582



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -284,25 +284,25 @@ The `tuningConfig` is optional. If no `tuningConfig` is 
specified, default param
 |`indexSpecForIntermediatePersists`|Object|Defines segment storage format 
options to be used at indexing time for intermediate persisted temporary 
segments. This can be used to disable dimension/metric compression on 
intermediate segments to reduce memory required for final merging. However, 
disabling compression on intermediate segments might increase page cache use 
while they are used before getting merged into final segment published, see 
[IndexSpec](#indexspec) for possible values.| no (default = same as 
`indexSpec`)|
 |`reportParseExceptions`|Boolean|If true, exceptions encountered during 
parsing will be thrown and will halt ingestion; if false, unparseable rows and 
fields will be skipped.|no (default == false)|
 |`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It 
must be >= 0, where 0 means to wait forever.| no (default == 0)|
-|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read 
Kinesis messages that are no longer available.<br/><br/>If false, the exception 
will bubble up, which will cause your tasks to fail and ingestion to halt. If 
this occurs, manual intervention is required to correct the situation; 
potentially using the [Reset Supervisor 
API](../../api-reference/api-reference.md#supervisors). This mode is useful for 
production, since it will make you aware of issues with ingestion.<br/><br/>If 
true, Druid will automatically reset to the earlier or latest sequence number 
available in Kinesis, based on the value of the `useEarliestSequenceNumber` 
property (earliest if true, latest if false). Please note that this can lead to 
data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ 
(if `useEarliestSequenceNumber` is true) without your knowledge. Messages will 
be logged indicating that a reset has occurred, but ingestion will continue. 
This mode is useful f
 or non-production situations, since it will make Druid attempt to recover from 
problems automatically, even if they lead to quiet dropping or duplicating of 
data.|no (default == false)|
+|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read 
Kinesis messages that are no longer available.<br/><br/>If false, the exception 
will bubble up, which will cause your tasks to fail and ingestion to halt. If 
this occurs, manual intervention is required to correct the situation, 
potentially using the [Reset Supervisor 
API](../../api-reference/api-reference.md#supervisors). This mode is useful for 
production, since it will make you aware of issues with ingestion.<br/><br/>If 
true, Druid will automatically reset to the earliest or latest sequence number 
available in Kinesis, based on the value of the `useEarliestSequenceNumber` 
property (earliest if true, latest if false). Please note that this can lead to 
data being *DROPPED* (if `useEarliestSequenceNumber` is false) or *DUPLICATED* 
(if `useEarliestSequenceNumber` is true) without your knowledge. Messages will 
be logged indicating that a reset has occurred, but ingestion will continue. 
This mode is useful 
 for non-production situations since it will make Druid attempt to recover from 
problems automatically, even if they lead to quiet dropping or duplicating of 
data.|no (default == false)|
 |`skipSequenceNumberAvailabilityCheck`|Boolean|Whether to enable checking if 
the current sequence number is still available in a particular Kinesis shard. 
If set to false, the indexing task will attempt to reset the current sequence 
number (or not), depending on the value of `resetOffsetAutomatically`.|no 
(default == false)|
 |`workerThreads`|Integer|The number of threads that the supervisor uses to 
handle requests/responses for worker tasks, along with any other internal 
asynchronous operation.|no (default == min(10, taskCount))|
-|`chatAsync`|Boolean| If true, use asynchronous communication with indexing 
tasks, and ignore the `chatThreads` parameter. If false, use synchronous 
communication in a thread pool of size `chatThreads`.                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                     | no (default == true)                                     
                           |
+|`chatAsync`|Boolean| If true, use asynchronous communication with indexing 
tasks, and ignore the `chatThreads` parameter. If false, use synchronous 
communication in a thread pool of size `chatThreads`.| no (default == true)|
 |`chatThreads`|Integer| The number of threads that will be used for 
communicating with indexing tasks. Ignored if `chatAsync` is `true` (the 
default).| no (default == min(10, taskCount * replicas))|
 |`chatRetries`|Integer|The number of times HTTP requests to indexing tasks 
will be retried before considering tasks unresponsive.| no (default == 8)|
 |`httpTimeout`|ISO8601 Period|How long to wait for a HTTP response from an 
indexing task.|no (default == PT10S)|
 |`shutdownTimeout`|ISO8601 Period|How long to wait for the supervisor to 
attempt a graceful shutdown of tasks before exiting.|no (default == PT80S)|
 |`recordBufferSize`|Integer|Size of the buffer (number of events) used between 
the Kinesis fetch threads and the main ingestion thread.|no (see [Determining 
fetch settings](#determining-fetch-settings) for defaults)|
 |`recordBufferOfferTimeout`|Integer|Length of time in milliseconds to wait for 
space to become available in the buffer before timing out.| no (default == 
5000)|
 |`recordBufferFullWait`|Integer|Length of time in milliseconds to wait for the 
buffer to drain before attempting to fetch records from Kinesis again.|no 
(default == 5000)|
-|`fetchThreads`|Integer|Size of the pool of threads fetching data from 
Kinesis. There is no benefit in having more threads than Kinesis shards.|no 
(default == procs * 2, where "procs" is the number of processors available to 
the task)                                                                  |
+|`fetchThreads`|Integer|Size of the pool of threads fetching data from 
Kinesis. There is no benefit in having more threads than Kinesis shards.|no 
(default == procs * 2, where "procs" is the number of processors available to 
the task)|
 |`segmentWriteOutMediumFactory`|Object|Segment write-out medium to use when 
creating segments. See below for more information.|no (not specified by 
default, the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` 
is used)|
 |`intermediateHandoffPeriod`|ISO8601 Period|How often the tasks should hand 
off segments. Handoff will happen either if `maxRowsPerSegment` or 
`maxTotalRows` is hit or every `intermediateHandoffPeriod`, whichever happens 
earlier.| no (default == P2147483647D)|
 |`logParseExceptions`|Boolean|If true, log an error message when a parsing 
exception occurs, containing information about the row where the error 
occurred.|no, default == false|
 |`maxParseExceptions`|Integer|The maximum number of parse exceptions that can 
occur before the task halts ingestion and fails. Overridden if 
`reportParseExceptions` is set.|no, unlimited default|
 |`maxSavedParseExceptions`|Integer|When a parse exception occurs, Druid can 
keep track of the most recent parse exceptions. "maxSavedParseExceptions" 
limits how many exception instances will be saved. These saved exceptions will 
be made available after the task finishes in the [task completion 
report](../../ingestion/tasks.md#task-reports). Overridden if 
`reportParseExceptions` is set.|no, default == 0|
 |`maxRecordsPerPoll`|Integer|The maximum number of records/events to be 
fetched from buffer per poll. The actual maximum will be 
`Max(maxRecordsPerPoll, Max(bufferSize, 1))`|no (see [Determining fetch 
settings](#determining-fetch-settings) for defaults)|
-|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or 
merged, the supervisor will recompute shard -> task group mappings, and signal 
any running tasks created under the old mappings to stop early at (current time 
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to 
begin reading from the new shards more quickly. The repartition transition wait 
time controlled by this property gives the stream additional time to write 
records to the new shards after the split/merge, which helps avoid the issues 
with empty shard handling described at 
https://github.com/apache/druid/issues/7600.|no, (default == PT2M)|
+|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or 
merged, the supervisor will recompute shard -> task group mappings, and signal 
any running tasks created under the old mappings to stop early at (current time 
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to 
begin reading from the new shards more quickly. The repartition transition wait 
time controlled by this property gives the stream additional time to write 
records to the new shards after the split/merge, which helps avoid issues with 
[empty shard handling](https://github.com/apache/druid/issues/7600).|no, 
(default == PT2M)|

Review Comment:
   ```suggestion
   |`repartitionTransitionDuration`|ISO8601 period|When shards are split or 
merged, the supervisor recomputes shard to task group mappings. The supervisor 
also signals any running tasks created under the old mappings to stop early at 
(current time + `repartitionTransitionDuration`). Stopping the tasks early 
allows Druid to begin reading from the new shards more quickly. The repartition 
transition wait time controlled by this property gives the stream additional 
time to write records to the new shards after the split or merge, which helps 
avoid issues with [empty shard 
handling](https://github.com/apache/druid/issues/7600).|no, (default == PT2M)|
   ```
   Removed future tense. 
   Consider removing parenthesis from "(current time + 
`repartitionTransitionDuration`)". Maybe rewrite that sentence so that there is 
no need to use parenthesis.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] ektravel commented on a diff in pull request #14356: Minor cleanups from working on sampling bugfix

Reply via email to