Arwiim opened a new issue, #17467:
URL: https://github.com/apache/druid/issues/17467
I'm experiencing an issue where the fetchDelayMillis and recordsPerFetch
parameters in the tuningConfig of a Kinesis ingestion task are not being
applied. These parameters seem to be ignored or removed when the task is
submitted, resulting in the ingestion task exceeding Kinesis shard read
throughput limits and causing ProvisionedThroughputExceededException errors.
Despite specifying these parameters in the ingestion spec, they do not
appear in the running task's configuration, and the task continues to exceed
the Kinesis API call limits.
Steps to Reproduce:
Set Up Kinesis Stream:
A Kinesis stream named imply_ranty_nrules_stream with 6 shards.
Approximately 2,000 records per second are entering the stream.
Create the Ingestion Spec:
Here's the ingestion spec used:
```
{
"type": "kinesis",
"spec": {
"dataSchema": {
"dataSource": "imply_ranty_nrules_stream_v2",
"timestampSpec": {
"column": "time",
"format": "iso",
"missingValue": null
},
"dimensionsSpec": {
"dimensions": [
// Dimension definitions
],
"dimensionExclusions": [
"__time",
"!!!_no_such_column_!!!"
],
"includeAllDimensions": false,
"useSchemaDiscovery": false
},
"metricsSpec": [],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "DAY",
"queryGranularity": {
"type": "none"
},
"rollup": false,
"intervals": []
},
"transformSpec": {
"filter": null,
"transforms": [
// Transform definitions
]
}
},
"ioConfig": {
"type": "kinesis",
"stream": "imply_ranty_nrules_stream",
"endpoint": "kinesis.us-east-1.amazonaws.com",
"inputFormat": {
"type": "json"
},
"replicas": 1,
"taskCount": 1,
"taskDuration": "PT3600S",
"startDelay": "PT5S",
"period": "PT30S",
"useEarliestSequenceNumber": false,
"completionTimeout": "PT1800S",
"idleConfig": {
"enabled": false
}
},
"tuningConfig": {
"type": "KinesisTuningConfig",
"maxRowsInMemory": 150000,
"maxRowsPerSegment": 750000,
"maxTotalRows": 3000000,
"intermediatePersistPeriod": "PT10M",
"fetchDelayMillis": 500,
"recordsPerFetch": 1000,
// Additional tuning configurations
}
},
"context": null,
"suspended": false
}
```
Note: I've set fetchDelayMillis and recordsPerFetch in the tuningConfig.
Observe the Running Task Configuration:
Checked the task details in the Druid console under the Tasks section.
Noticed that the fetchDelayMillis and recordsPerFetch parameters are missing
from the running task's tuningConfig.
Monitor Logs and Metrics:
.Despite the parameters being set in the ingestion spec, the ingestion task
continues to exceed Kinesis read limits.
.The following errors appear in the MiddleManager logs:
```
com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException:
Rate exceeded for Shard - [shard details] (Service: AmazonKinesis; Status Code:
400; Error Code: ProvisionedThroughputExceededException; ...)
```
Expected Behavior:
The ingestion task should apply the fetchDelayMillis and recordsPerFetch
parameters from the tuningConfig.
The task should respect the Kinesis shard read throughput limits by
throttling GetRecords calls according to the specified parameters.
The parameters should appear in the running task's configuration when
inspected.
Actual Behavior:
The fetchDelayMillis and recordsPerFetch parameters are not applied.
These parameters are missing from the running task's tuningConfig.
The ingestion task exceeds Kinesis read throughput limits, resulting in
ProvisionedThroughputExceededException errors.
Increasing taskCount or adjusting other parameters does not resolve the
issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]