[GitHub] [druid] kfaraz commented on a diff in pull request #14356: Minor cleanups from working on sampling bugfix

via GitHub Tue, 30 May 2023 19:14:28 -0700


kfaraz commented on code in PR #14356:
URL: https://github.com/apache/druid/pull/14356#discussion_r1210994575



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -284,7 +284,7 @@ The `tuningConfig` is optional. If no `tuningConfig` is 
specified, default param
 |`indexSpecForIntermediatePersists`|Object|Defines segment storage format 
options to be used at indexing time for intermediate persisted temporary 
segments. This can be used to disable dimension/metric compression on 
intermediate segments to reduce memory required for final merging. However, 
disabling compression on intermediate segments might increase page cache use 
while they are used before getting merged into final segment published, see 
[IndexSpec](#indexspec) for possible values.| no (default = same as 
`indexSpec`)|
 |`reportParseExceptions`|Boolean|If true, exceptions encountered during 
parsing will be thrown and will halt ingestion; if false, unparseable rows and 
fields will be skipped.|no (default == false)|
 |`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It 
must be >= 0, where 0 means to wait forever.| no (default == 0)|
-|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read 
Kinesis messages that are no longer available.<br/><br/>If false, the exception 
will bubble up, which will cause your tasks to fail and ingestion to halt. If 
this occurs, manual intervention is required to correct the situation; 
potentially using the [Reset Supervisor 
API](../../api-reference/api-reference.md#supervisors). This mode is useful for 
production, since it will make you aware of issues with ingestion.<br/><br/>If 
true, Druid will automatically reset to the earlier or latest sequence number 
available in Kinesis, based on the value of the `useEarliestSequenceNumber` 
property (earliest if true, latest if false). Please note that this can lead to 
data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ 
(if `useEarliestSequenceNumber` is true) without your knowledge. Messages will 
be logged indicating that a reset has occurred, but ingestion will continue. 
This mode is useful f
 or non-production situations, since it will make Druid attempt to recover from 
problems automatically, even if they lead to quiet dropping or duplicating of 
data.|no (default == false)|
+|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read 
Kinesis messages that are no longer available.<br/><br/>If false, the exception 
will bubble up, which will cause your tasks to fail and ingestion to halt. If 
this occurs, manual intervention is required to correct the situation; 
potentially using the [Reset Supervisor 
API](../../api-reference/api-reference.md#supervisors). This mode is useful for 
production, since it will make you aware of issues with ingestion.<br/><br/>If 
true, Druid will automatically reset to the earlier or latest sequence number 
available in Kinesis, based on the value of the `useEarliestSequenceNumber` 
property (earliest if true, latest if false). Please note that this can lead to 
data being *DROPPED* (if `useEarliestSequenceNumber` is false) or *DUPLICATED* 
(if `useEarliestSequenceNumber` is true) without your knowledge. Messages will 
be logged indicating that a reset has occurred, but ingestion will continue. 
This mode is useful f
 or non-production situations, since it will make Druid attempt to recover from 
problems automatically, even if they lead to quiet dropping or duplicating of 
data.|no (default == false)|

Review Comment:
   Could you please indicate what has changed in this line? The diff doesn't 
show it clearly. The only change I could find is `_DROPPED_` to `*DROPPED*` 
which is equivalent markdown afaik. If that is the case, best revert this 
change as it affects neither readability nor the rendered HTML.



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -352,14 +353,14 @@ look for credentials set in environment variables, via 
[Web Identity Token](http
 profile provider (in this order).
 
 To ingest data from Kinesis, ensure that the policy attached to your IAM role 
contains the necessary permissions.
-The permissions needed depend on the value of `useListShards`.
+The required permissions depend on the value of `useListShards`.
 
 If the `useListShards` flag is set to `true`, you need following permissions:
 
-* `ListStreams`: required to list your data streams
-* `Get*`: required for `GetShardIterator`
-* `GetRecords`: required to get data records from a data stream's shard
-* `ListShards` : required to get the shards for a stream of interest
+- `ListStreams`: required to list your data streams

Review Comment:
   Both `*` and `-` can be used for lists. We can probably skip this change. 
Other places in the docs might be using `*` for lists too, so I don't think 
consistency is a real requirement here.



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -380,14 +381,14 @@ If the `useListShards` flag is set to `true`, you need 
following permissions:
 
 If the `useListShards` flag is set to `false`, you need following permissions:
 
-* `ListStreams`: required to list your data streams
-* `Get*`: required for `GetShardIterator`
-* `GetRecords`: required to get data records from a data stream's shard
-* `DescribeStream`: required to describe the specified data stream
+- `ListStreams`: required to list your data streams
+- `Get*`: required for `GetShardIterator`
+- `GetRecords`: required to get data records from a data stream's shard
+- `DescribeStream`: required to describe the specified data stream
 
 **Example policy**
 
-```    
+```

Review Comment:
   Use ` ```json ` maybe?



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -302,7 +302,7 @@ The `tuningConfig` is optional. If no `tuningConfig` is 
specified, default param
 |`maxParseExceptions`|Integer|The maximum number of parse exceptions that can 
occur before the task halts ingestion and fails. Overridden if 
`reportParseExceptions` is set.|no, unlimited default|
 |`maxSavedParseExceptions`|Integer|When a parse exception occurs, Druid can 
keep track of the most recent parse exceptions. "maxSavedParseExceptions" 
limits how many exception instances will be saved. These saved exceptions will 
be made available after the task finishes in the [task completion 
report](../../ingestion/tasks.md#task-reports). Overridden if 
`reportParseExceptions` is set.|no, default == 0|
 |`maxRecordsPerPoll`|Integer|The maximum number of records/events to be 
fetched from buffer per poll. The actual maximum will be 
`Max(maxRecordsPerPoll, Max(bufferSize, 1))`|no (see [Determining fetch 
settings](#determining-fetch-settings) for defaults)|
-|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or 
merged, the supervisor will recompute shard -> task group mappings, and signal 
any running tasks created under the old mappings to stop early at (current time 
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to 
begin reading from the new shards more quickly. The repartition transition wait 
time controlled by this property gives the stream additional time to write 
records to the new shards after the split/merge, which helps avoid the issues 
with empty shard handling described at 
https://github.com/apache/druid/issues/7600.|no, (default == PT2M)|
+|`repartitionTransitionDuration`|ISO8601 Period|When shards are split or 
merged, the supervisor will recompute shard -> task group mappings, and signal 
any running tasks created under the old mappings to stop early at (current time 
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to 
begin reading from the new shards more quickly. The repartition transition wait 
time controlled by this property gives the stream additional time to write 
records to the new shards after the split/merge, which helps avoid the issues 
with empty shard handling described at 
<https://github.com/apache/druid/issues/7600>.|no, (default == PT2M)|

Review Comment:
   ```suggestion
   |`repartitionTransitionDuration`|ISO8601 Period|When shards are split or 
merged, the supervisor will recompute shard -> task group mappings, and signal 
any running tasks created under the old mappings to stop early at (current time 
+ `repartitionTransitionDuration`). Stopping the tasks early allows Druid to 
begin reading from the new shards more quickly. The repartition transition wait 
time controlled by this property gives the stream additional time to write 
records to the new shards after the split/merge, which helps avoid the issues 
with empty shard handling described [in this 
issue](https://github.com/apache/druid/issues/7600).|no, (default == PT2M)|
   ```
   
   Maybe use some text which is hyperlinked to the url.



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -477,25 +479,25 @@ it will just ensure that no indexing tasks are running 
until the supervisor is r
 
 ### Resetting Supervisors
 
-The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears 
stored 
-sequence numbers, causing the supervisor to start reading from either the 
earliest or 
-latest sequence numbers in Kinesis (depending on the value of 
`useEarliestSequenceNumber`). 
-After clearing stored sequence numbers, the supervisor kills and recreates 
active tasks, 
-so that tasks begin reading from valid sequence numbers. 
+The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears 
stored

Review Comment:
   I would suggest leaving out this change as it is only removing trailing 
spaces. It would be better to do this whenever we update the content in this 
section.



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -635,10 +639,11 @@ To enable this feature, set `deaggregate` to true in your 
`ioConfig` when submit
 
 When changing the shard count for a Kinesis stream, there will be a window of 
time around the resharding operation with early shutdown of Kinesis ingestion 
tasks and possible task failures.
 
-The early shutdowns and task failures are expected, and they occur because the 
supervisor will update the shard -> task group mappings as shards are closed 
and fully read, to ensure that tasks are not running 
-with an assignment of closed shards that have been fully read and to ensure a 
balanced distribution of active shards across tasks. 
+The early shutdowns and task failures are expected, and they occur because the 
supervisor will update the shard -> task group mappings as shards are closed 
and fully read, to ensure that tasks are not running

Review Comment:
   Same comment about trailing spaces.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] kfaraz commented on a diff in pull request #14356: Minor cleanups from working on sampling bugfix

Reply via email to