[GitHub] [druid] kfaraz commented on a diff in pull request #14356: Minor cleanups from working on sampling bugfix

via GitHub Tue, 30 May 2023 20:18:13 -0700


kfaraz commented on code in PR #14356:
URL: https://github.com/apache/druid/pull/14356#discussion_r1211042955



##########
docs/development/extensions-core/kinesis-ingestion.md:
##########
@@ -284,7 +284,7 @@ The `tuningConfig` is optional. If no `tuningConfig` is 
specified, default param
 |`indexSpecForIntermediatePersists`|Object|Defines segment storage format 
options to be used at indexing time for intermediate persisted temporary 
segments. This can be used to disable dimension/metric compression on 
intermediate segments to reduce memory required for final merging. However, 
disabling compression on intermediate segments might increase page cache use 
while they are used before getting merged into final segment published, see 
[IndexSpec](#indexspec) for possible values.| no (default = same as 
`indexSpec`)|
 |`reportParseExceptions`|Boolean|If true, exceptions encountered during 
parsing will be thrown and will halt ingestion; if false, unparseable rows and 
fields will be skipped.|no (default == false)|
 |`handoffConditionTimeout`|Long| Milliseconds to wait for segment handoff. It 
must be >= 0, where 0 means to wait forever.| no (default == 0)|
-|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read 
Kinesis messages that are no longer available.<br/><br/>If false, the exception 
will bubble up, which will cause your tasks to fail and ingestion to halt. If 
this occurs, manual intervention is required to correct the situation; 
potentially using the [Reset Supervisor 
API](../../api-reference/api-reference.md#supervisors). This mode is useful for 
production, since it will make you aware of issues with ingestion.<br/><br/>If 
true, Druid will automatically reset to the earlier or latest sequence number 
available in Kinesis, based on the value of the `useEarliestSequenceNumber` 
property (earliest if true, latest if false). Please note that this can lead to 
data being _DROPPED_ (if `useEarliestSequenceNumber` is false) or _DUPLICATED_ 
(if `useEarliestSequenceNumber` is true) without your knowledge. Messages will 
be logged indicating that a reset has occurred, but ingestion will continue. 
This mode is useful f
 or non-production situations, since it will make Druid attempt to recover from 
problems automatically, even if they lead to quiet dropping or duplicating of 
data.|no (default == false)|
+|`resetOffsetAutomatically`|Boolean|Controls behavior when Druid needs to read 
Kinesis messages that are no longer available.<br/><br/>If false, the exception 
will bubble up, which will cause your tasks to fail and ingestion to halt. If 
this occurs, manual intervention is required to correct the situation; 
potentially using the [Reset Supervisor 
API](../../api-reference/api-reference.md#supervisors). This mode is useful for 
production, since it will make you aware of issues with ingestion.<br/><br/>If 
true, Druid will automatically reset to the earlier or latest sequence number 
available in Kinesis, based on the value of the `useEarliestSequenceNumber` 
property (earliest if true, latest if false). Please note that this can lead to 
data being *DROPPED* (if `useEarliestSequenceNumber` is false) or *DUPLICATED* 
(if `useEarliestSequenceNumber` is true) without your knowledge. Messages will 
be logged indicating that a reset has occurred, but ingestion will continue. 
This mode is useful f
 or non-production situations, since it will make Druid attempt to recover from 
problems automatically, even if they lead to quiet dropping or duplicating of 
data.|no (default == false)|

Review Comment:
   Formatting tools can be tricky as every contributor might be using their own 
preferred one, unless there is one recommended as a Druid standard. There is 
only so much consistency we can (or would want to) enforce as after all, every 
developer has their own style. Think of it is as using a classic `for (i = 0; 
++i)` loop vs a `for each` loop vs the `forEach` operator from the Streams API. 
In some cases, they might improve readability/object creation but in others, 
they really make no difference.
   
   I am okay with the change to the list item marker as it is fairly small. But 
this description of `resetOffsetAutomatically` is rather complicated and I 
would prefer we touch it only when needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] kfaraz commented on a diff in pull request #14356: Minor cleanups from working on sampling bugfix

Reply via email to