[ https://issues.apache.org/jira/browse/SPARK-25721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthikeyan Ravi updated SPARK-25721: ------------------------------------- Attachment: Screen Shot 2019-09-16 at 12.27.25 PM.png > maxRate configuration not being used in Kinesis receiver > -------------------------------------------------------- > > Key: SPARK-25721 > URL: https://issues.apache.org/jira/browse/SPARK-25721 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.2.0 > Reporter: Zhaobo Yu > Priority: Major > Attachments: Screen Shot 2019-09-16 at 12.27.25 PM.png, > rate_violation.png > > > In the onStart() function of KinesisReceiver class, the > KinesisClientLibConfiguration object is initialized in the following way, > val kinesisClientLibConfiguration = new KinesisClientLibConfiguration( > checkpointAppName, > streamName, > kinesisProvider, > dynamoDBCreds.map(_.provider).getOrElse(kinesisProvider), > cloudWatchCreds.map(_.provider).getOrElse(kinesisProvider), > workerId) > .withKinesisEndpoint(endpointUrl) > .withInitialPositionInStream(initialPositionInStream) > .withTaskBackoffTimeMillis(500) > .withRegionName(regionName) > > As you can see there is no withMaxRecords() in initialization, so > KinesisClientLibConfiguration will set it to 10000 by default since it has > been hard coded as this way, > public static final int DEFAULT_MAX_RECORDS = 10000; > In such a case, the receiver will not fulfill any maxRate setting we set if > it's less than 10k, worse still, it will cause > ProvisionedThroughputExceededException from Kinesis, especially when we > restart the streaming application. > > Attached rate_violation.png, we have a spark streaming application that has > 40 receivers, which is set to consume 1 record per second. Within 5 minutes > the spark streaming application should take no more than 12k records/5 > minutes (40*60*5 = 12k), but cloudwatch metrics shows it was consuming more > than that, which is almost at the rate of 22k records/5 minutes. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org