david przybill created SPARK-18620:
--------------------------------------
Summary: Spark Streaming + Kinesis : Receiver MaxRate is violated
Key: SPARK-18620
URL: https://issues.apache.org/jira/browse/SPARK-18620
Project: Spark
Issue Type: Bug
Components: DStreams
Affects Versions: 2.0.2
Reporter: david przybill
Priority: Minor
I am calling spark-submit passing maxRate, I have a single kinesis receiver,
and batches of 1s
spark-submit --conf spark.streaming.receiver.maxRate=10 ....
however a single batch can greatly exceed the stablished maxRate. i.e: Im
getting 300 records.
it looks like Kinesis is completely ignoring the
spark.streaming.receiver.maxRate configuration.
If you look inside KinesisReceiver.onStart, you see:
val kinesisClientLibConfiguration =
new KinesisClientLibConfiguration(checkpointAppName, streamName,
awsCredProvider, workerId)
.withKinesisEndpoint(endpointUrl)
.withInitialPositionInStream(initialPositionInStream)
.withTaskBackoffTimeMillis(500)
.withRegionName(regionName)
This constructor ends up calling another constructor which has a lot of default
values for the configuration. One of those values is DEFAULT_MAX_RECORDS which
is constantly set to 10,000 records.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]