Re: [PR] Add supervisor actions (druid)

via GitHub Thu, 18 Apr 2024 10:57:31 -0700


techdocsmith commented on code in PR #16276:
URL: https://github.com/apache/druid/pull/16276#discussion_r1571158174



##########
docs/ingestion/supervisor.md:
##########
@@ -226,7 +226,7 @@ When an Overlord gains leadership, either by being started 
or as a result of ano
 
 ### Schema and configuration changes
 
-Schema and configuration changes are handled by submitting the new supervisor 
spec. The Overlord initiates a graceful shutdown of the existing supervisor. 
The running supervisor signals its tasks to stop reading and begin publishing, 
exiting itself. Druid then uses the provided configuration to create a new 
supervisor. Druid submits a new schema while retaining existing publishing 
tasks and starts new tasks at the previous task offsets.
+To make schema or configuration changes, you must submit a new supervisor 
spec. The Overlord initiates a graceful shutdown of the existing supervisor. 
The running supervisor signals its tasks to stop reading and begin publishing, 
exiting itself. Druid then uses the provided configuration to create a new 
supervisor. Druid submits a new schema while retaining existing publishing 
tasks and starts new tasks at the previous task offsets.

Review Comment:
   ```suggestion
   To make schema or configuration changes, you must submit a new supervisor 
spec. The Overlord initiates a graceful shutdown of the existing supervisor. 
The running supervisor signals its tasks to stop reading and begin publishing, 
exiting itself. Druid then uses the new configuration to create a new 
supervisor. Druid submits the updated schema while retaining existing 
publishing tasks. It also starts new tasks at the previous task offsets.
   ```
   nit



##########
docs/ingestion/supervisor.md:
##########
@@ -182,14 +182,14 @@ The following example shows a supervisor spec with 
`lagBased` autoscaler:
 
 The `tuningConfig` object is optional. If you don't specify the `tuningConfig` 
object, Druid uses the default configuration settings.
 
-The following table outlines the `tuningConfig` configuration properties that 
apply to both Apache Kafka and Amazon Kinesis ingestion methods.
-For configuration properties specific to Apache Kafka and Amazon Kinesis, see 
[Kafka tuning configuration](kafka-ingestion.md#tuning-configuration) and 
[Kinesis tuning configuration](kinesis-ingestion.md#tuning-configuration) 
respectively.
+The following table outlines the `tuningConfig` configuration properties that 
apply to both Kafka and Kinesis ingestion methods.
+For configuration properties specific to Kafka and Kinesis, see [Kafka tuning 
configuration](kafka-ingestion.md#tuning-configuration) and [Kinesis tuning 
configuration](kinesis-ingestion.md#tuning-configuration) respectively.
 
 |Property|Type|Description|Required|Default|
 |--------|----|-----------|--------|-------|
 |`type`|String|The tuning type code for the ingestion method. One of `kafka` 
or `kinesis`.|Yes||
 |`maxRowsInMemory`|Integer|The number of rows to accumulate before persisting. 
This number represents the post-aggregation rows. It is not equivalent to the 
number of input events, but the resulting number of aggregated rows. Druid uses 
`maxRowsInMemory` to manage the required JVM heap size. The maximum heap memory 
usage for indexing scales is `maxRowsInMemory * (2 + maxPendingPersists)`. 
Normally, you don't need to set this, but depending on the nature of data, if 
rows are short in terms of bytes, you may not want to store a million rows in 
memory and this value should be set.|No|150000|
-|`maxBytesInMemory`|Long|The number of bytes to accumulate in heap memory 
before persisting. This is based on a rough estimate of memory usage and not 
actual usage. Normally, this is computed internally. The maximum heap memory 
usage for indexing is `maxBytesInMemory * (2 + 
maxPendingPersists)`.|No|One-sixth of max JVM memory|
+|`maxBytesInMemory`|Long|The number of bytes to accumulate in heap memory 
before persisting. The value is based on a rough estimate of memory usage and 
not actual usage. Normally, it is computed internally. The maximum heap memory 
usage for indexing is `maxBytesInMemory * (2 + 
maxPendingPersists)`.|No|One-sixth of max JVM memory|

Review Comment:
   ```suggestion
   |`maxBytesInMemory`|Long|The number of bytes to accumulate in heap memory 
before persisting. The value is based on a rough estimate of memory usage and 
not actual usage. Normally, Druid computes the value internally. The maximum 
heap memory usage for indexing is `maxBytesInMemory * (2 + 
maxPendingPersists)`.|No|One-sixth of max JVM memory|
   ```
   nit: avoid passive



##########
docs/ingestion/supervisor.md:
##########
@@ -200,23 +200,23 @@ For configuration properties specific to Apache Kafka and 
Amazon Kinesis, see [K
 |`indexSpecForIntermediatePersists`|Object|Defines segment storage format 
options to use at indexing time for intermediate persisted temporary segments. 
You can use `indexSpecForIntermediatePersists` to disable dimension/metric 
compression on intermediate segments to reduce memory required for final 
merging. However, disabling compression on intermediate segments might increase 
page cache use while they are used before getting merged into final segment 
published.|No||
 |`reportParseExceptions`|Boolean|DEPRECATED. If `true`, Druid throws 
exceptions encountered during parsing causing ingestion to halt. If `false`, 
Druid skips unparseable rows and fields. Setting `reportParseExceptions` to 
`true` overrides existing configurations for `maxParseExceptions` and 
`maxSavedParseExceptions`, setting `maxParseExceptions` to 0 and limiting 
`maxSavedParseExceptions` to not more than 1.|No|`false`|
 |`handoffConditionTimeout`|Long|Number of milliseconds to wait for segment 
handoff. Set to a value >= 0, where 0 means to wait indefinitely.|No|900000 (15 
minutes) for Kafka. 0 for Kinesis.|
-|`resetOffsetAutomatically`|Boolean|Resets partitions when the sequence number 
is unavailable. If set to `true`, Druid resets partitions to the earliest or 
latest Kafka sequence number or Kinesis offset, based on the value of 
`useEarliestSequenceNumber` or `useEarliestOffset` (earliest if `true`, latest 
if `false`). If set to `false`, the exception bubbles up causing tasks to fail 
and ingestion to halt. If this occurs, manual intervention is required to 
correct the situation, potentially through [resetting the 
supervisor](../api-reference/supervisor-api.md#reset-a-supervisor).|No|`false`|
+|`resetOffsetAutomatically`|Boolean|Resets partitions when the sequence number 
is unavailable. If set to `true`, Druid resets partitions to the earliest or 
latest offset, based on the value of `useEarliestSequenceNumber` or 
`useEarliestOffset` (earliest if `true`, latest if `false`). If set to `false`, 
the exception bubbles up causing tasks to fail and ingestion to halt. If this 
occurs, manual intervention is required to correct the situation, potentially 
through [resetting the 
supervisor](../api-reference/supervisor-api.md#reset-a-supervisor).|No|`false`|

Review Comment:
   ```suggestion
   |`resetOffsetAutomatically`|Boolean|Resets partitions when the sequence 
number is unavailable. If set to `true`, Druid resets partitions to the 
earliest or latest offset, based on the value of `useEarliestSequenceNumber` or 
`useEarliestOffset` (earliest if `true`, latest if `false`). If set to `false`, 
Druid surfaces the exception causing tasks to fail and ingestion to halt. If 
this occurs, manual intervention is required to correct the situation, 
potentially through [resetting the 
supervisor](../api-reference/supervisor-api.md#reset-a-supervisor).|No|`false`|
   ```
   avoid "bubbles up" 



##########
docs/ingestion/supervisor.md:
##########
@@ -339,6 +339,54 @@ SELECT * FROM sys.supervisors WHERE healthy=0;
 
 For more information on the supervisors system table, see [SUPERVISORS 
table](../querying/sql-metadata-tables.md#supervisors-table).
 
+## Manage a supervisor
+
+You can manage a supervisor from the web console or using the [Supervisor 
API](../api-reference/supervisor-api.md).

Review Comment:
   ```suggestion
   You can manage a supervisor from the web console or with the [Supervisor 
API](../api-reference/supervisor-api.md).
   ```



##########
docs/ingestion/supervisor.md:
##########
@@ -200,23 +200,23 @@ For configuration properties specific to Apache Kafka and 
Amazon Kinesis, see [K
 |`indexSpecForIntermediatePersists`|Object|Defines segment storage format 
options to use at indexing time for intermediate persisted temporary segments. 
You can use `indexSpecForIntermediatePersists` to disable dimension/metric 
compression on intermediate segments to reduce memory required for final 
merging. However, disabling compression on intermediate segments might increase 
page cache use while they are used before getting merged into final segment 
published.|No||
 |`reportParseExceptions`|Boolean|DEPRECATED. If `true`, Druid throws 
exceptions encountered during parsing causing ingestion to halt. If `false`, 
Druid skips unparseable rows and fields. Setting `reportParseExceptions` to 
`true` overrides existing configurations for `maxParseExceptions` and 
`maxSavedParseExceptions`, setting `maxParseExceptions` to 0 and limiting 
`maxSavedParseExceptions` to not more than 1.|No|`false`|
 |`handoffConditionTimeout`|Long|Number of milliseconds to wait for segment 
handoff. Set to a value >= 0, where 0 means to wait indefinitely.|No|900000 (15 
minutes) for Kafka. 0 for Kinesis.|
-|`resetOffsetAutomatically`|Boolean|Resets partitions when the sequence number 
is unavailable. If set to `true`, Druid resets partitions to the earliest or 
latest Kafka sequence number or Kinesis offset, based on the value of 
`useEarliestSequenceNumber` or `useEarliestOffset` (earliest if `true`, latest 
if `false`). If set to `false`, the exception bubbles up causing tasks to fail 
and ingestion to halt. If this occurs, manual intervention is required to 
correct the situation, potentially through [resetting the 
supervisor](../api-reference/supervisor-api.md#reset-a-supervisor).|No|`false`|
+|`resetOffsetAutomatically`|Boolean|Resets partitions when the sequence number 
is unavailable. If set to `true`, Druid resets partitions to the earliest or 
latest offset, based on the value of `useEarliestSequenceNumber` or 
`useEarliestOffset` (earliest if `true`, latest if `false`). If set to `false`, 
the exception bubbles up causing tasks to fail and ingestion to halt. If this 
occurs, manual intervention is required to correct the situation, potentially 
through [resetting the 
supervisor](../api-reference/supervisor-api.md#reset-a-supervisor).|No|`false`|
 |`workerThreads`|Integer|The number of threads that the supervisor uses to 
handle requests/responses for worker tasks, along with any other internal 
asynchronous operation.|No|`min(10, taskCount)`|
 |`chatRetries`|Integer|The number of times Druid retries HTTP requests to 
indexing tasks before considering tasks unresponsive.|No|8|
 |`httpTimeout`|ISO 8601 period|The period of time to wait for a HTTP response 
from an indexing task.|No|`PT10S`|
 |`shutdownTimeout`|ISO 8601 period|The period of time to wait for the 
supervisor to attempt a graceful shutdown of tasks before exiting.|No|`PT80S`|
 |`offsetFetchPeriod`|ISO 8601 period|Determines how often the supervisor 
queries the streaming source and the indexing tasks to fetch current offsets 
and calculate lag. If the user-specified value is below the minimum value of 
`PT5S`, the supervisor ignores the value and uses the minimum value 
instead.|No|`PT30S`|
 |`segmentWriteOutMediumFactory`|Object|The segment write-out medium to use 
when creating segments. See [Additional Peon configuration: 
SegmentWriteOutMediumFactory](../configuration/index.md#segmentwriteoutmediumfactory)
 for explanation and available options.|No|If not specified, Druid uses the 
value from `druid.peon.defaultSegmentWriteOutMediumFactory.type`.|
 |`logParseExceptions`|Boolean|If `true`, Druid logs an error message when a 
parsing exception occurs, containing information about the row where the error 
occurred.|No|`false`|
-|`maxParseExceptions`|Integer|The maximum number of parse exceptions that can 
occur before the task halts ingestion and fails. Overridden if 
`reportParseExceptions` is set.|No|unlimited|
-|`maxSavedParseExceptions`|Integer|When a parse exception occurs, Druid keeps 
track of the most recent parse exceptions. `maxSavedParseExceptions` limits the 
number of saved exception instances. These saved exceptions are available after 
the task finishes in the [task completion 
report](../ingestion/tasks.md#task-reports). Overridden if 
`reportParseExceptions` is set.|No|0|
+|`maxParseExceptions`|Integer|The maximum number of parse exceptions that can 
occur before the task halts ingestion and fails. Setting 
`reportParseExceptions` overrides this limit.|No|unlimited|
+|`maxSavedParseExceptions`|Integer|When a parse exception occurs, Druid keeps 
track of the most recent parse exceptions. `maxSavedParseExceptions` limits the 
number of saved exception instances. These saved exceptions are available after 
the task finishes in the [task completion 
report](../ingestion/tasks.md#task-reports). Setting `reportParseExceptions` 
overrides this limit.|No|0|
 
 ## Start a supervisor
 
 Druid starts a new supervisor when you submit a supervisor spec.
-You can submit the supervisor spec using the Druid console [data 
loader](../operations/web-console.md#data-loader) or by calling the [Supervisor 
API](../api-reference/supervisor-api.md).
+You can submit the supervisor spec using the Druid web console [data 
loader](../operations/web-console.md#data-loader) or by calling the [Supervisor 
API](../api-reference/supervisor-api.md).

Review Comment:
   ```suggestion
   You can submit the supervisor spec in the Druid web console [data 
loader](../operations/web-console.md#data-loader) or with the [Supervisor 
API](../api-reference/supervisor-api.md).
   ```
   nit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add supervisor actions (druid)

Reply via email to