[GitHub] [druid] zhangyue19921010 commented on a change in pull request #10524: Dynamic auto scale Kafka-Stream ingest tasks

GitBox Thu, 04 Feb 2021 00:03:02 -0800


zhangyue19921010 commented on a change in pull request #10524:
URL: https://github.com/apache/druid/pull/10524#discussion_r570013313




##########
File path: docs/development/extensions-core/kafka-ingestion.md
##########
@@ -146,6 +146,26 @@ A sample supervisor spec is shown below:
 |`lateMessageRejectionStartDateTime`|ISO8601 DateTime|Configure tasks to 
reject messages with timestamps earlier than this date time; for example if 
this is set to `2016-01-01T11:00Z` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a realtime and a nightly batch ingestion pipeline).|no 
(default == none)|
 |`lateMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps earlier than this period before the task was created; 
for example if this is set to `PT1H` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a realtime and a nightly batch ingestion pipeline). Please 
note that only one of `lateMessageRejectionPeriod` or 
`lateMessageRejectionStartDateTime` can be specified.|no (default == none)|
 |`earlyMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps later than this period after the task reached its 
taskDuration; for example if this is set to `PT1H`, the taskDuration is set to 
`PT1H` and the supervisor creates a task at *2016-01-01T12:00Z*, messages with 
timestamps later than *2016-01-01T14:00Z* will be dropped. **Note:** Tasks 
sometimes run past their task duration, for example, in cases of supervisor 
failover. Setting earlyMessageRejectionPeriod too low may cause messages to be 
dropped unexpectedly whenever a task runs past its originally configured task 
duration.|no (default == none)|
+|`dynamicAllocationTasksProperties`|Object|`dynamicAllocationTasksProperties` 
to specify how to auto scale the number of Kafka ingest tasks based on Lag 
metrics. See [Dynamic Allocation Tasks Properties](#Dynamic Allocation Tasks 
Properties) for details.|no (default == null)|
+
+#### Dynamic Allocation Tasks Properties
+
+| Property | Description | Default |
+| ------------- | ------------- | ------------- |
+| `enableDynamicAllocationTasks` | whether enable this feature or not | false |

Review comment:
       like I mentioned above. This is a compromise. Current algorithms is 
relatively simple while can meet most scenarios(for examples regular traffic 
peak/sudden traffic peak).
   But for extreme cases, if users don't set scale-related configs properly, it 
will trigger scale action too frequently and creates lots of small segments. At 
this time, the user needs to manually control taskCount and this config can 
make disable/enable work more convenient.
   
   When the algorithm is smart enough, It is better to remove this parameter.
   
   As for default value false, I think it is the insurance to prevent users to 
enable autoscaler by accident like left `"autoscalerConfig": {}` after delete 
all the autoscaler related configs.
   

##########
File path: docs/development/extensions-core/kafka-ingestion.md
##########
@@ -146,6 +146,26 @@ A sample supervisor spec is shown below:
 |`lateMessageRejectionStartDateTime`|ISO8601 DateTime|Configure tasks to 
reject messages with timestamps earlier than this date time; for example if 
this is set to `2016-01-01T11:00Z` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a realtime and a nightly batch ingestion pipeline).|no 
(default == none)|
 |`lateMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps earlier than this period before the task was created; 
for example if this is set to `PT1H` and the supervisor creates a task at 
*2016-01-01T12:00Z*, messages with timestamps earlier than *2016-01-01T11:00Z* 
will be dropped. This may help prevent concurrency issues if your data stream 
has late messages and you have multiple pipelines that need to operate on the 
same segments (e.g. a realtime and a nightly batch ingestion pipeline). Please 
note that only one of `lateMessageRejectionPeriod` or 
`lateMessageRejectionStartDateTime` can be specified.|no (default == none)|
 |`earlyMessageRejectionPeriod`|ISO8601 Period|Configure tasks to reject 
messages with timestamps later than this period after the task reached its 
taskDuration; for example if this is set to `PT1H`, the taskDuration is set to 
`PT1H` and the supervisor creates a task at *2016-01-01T12:00Z*, messages with 
timestamps later than *2016-01-01T14:00Z* will be dropped. **Note:** Tasks 
sometimes run past their task duration, for example, in cases of supervisor 
failover. Setting earlyMessageRejectionPeriod too low may cause messages to be 
dropped unexpectedly whenever a task runs past its originally configured task 
duration.|no (default == none)|
+|`dynamicAllocationTasksProperties`|Object|`dynamicAllocationTasksProperties` 
to specify how to auto scale the number of Kafka ingest tasks based on Lag 
metrics. See [Dynamic Allocation Tasks Properties](#Dynamic Allocation Tasks 
Properties) for details.|no (default == null)|
+
+#### Dynamic Allocation Tasks Properties
+
+| Property | Description | Default |
+| ------------- | ------------- | ------------- |
+| `enableDynamicAllocationTasks` | whether enable this feature or not | false |

Review comment:
       like I mentioned above. This is a compromise. Current algorithms is 
relatively simple while can meet most scenarios(for examples regular traffic 
peak/sudden traffic peak).
   
   But for extreme cases, if users don't set scale-related configs properly, it 
will trigger scale action too frequently and creates lots of small segments. At 
this time, the user needs to manually control taskCount and this config can 
make disable/enable work more convenient.
   
   When the algorithm is smart enough, It is better to remove this parameter.
   
   As for default value false, I think it is the insurance to prevent users to 
enable autoscaler by accident like left `"autoscalerConfig": {}` after delete 
all the autoscaler related configs.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] zhangyue19921010 commented on a change in pull request #10524: Dynamic auto scale Kafka-Stream ingest tasks

Reply via email to