[
https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629049#comment-15629049
]
Fabien LD commented on SPARK-17938:
-----------------------------------
For us, backpressure works fine with:
- spark 2.0.1
- kafka 0.10
- scala 2.11
Here are the settings we have:
```
spark.streaming.backpressure.initialRate=1000
spark.streaming.receiver.maxRate=10000
spark.streaming.kafka.maxRatePerPartition=2000
```
The only problem we have is with `spark.streaming.receiver.maxRate`
Streaming starts at 2000 events per partition (processed in ~ 4-5s after a few
initial batches) and then backpressure starts doing its job. But the actual
average rate is then around 25000 records per second (a bit more than 50000
records per 2s batch) which is higher than the configured maxRate. Later,
because we have catched up with live record production in kafka, it drops to a
lower rate.
=> maxRate is never applied and this is a problem because this would allow us
to control warm up and increase the maxRatePerPartition (that we have to keep
low for the initial few batches, before backpressure starts doing its job) and
process spikes happening later in some topics.
> Backpressure rate not adjusting
> -------------------------------
>
> Key: SPARK-17938
> URL: https://issues.apache.org/jira/browse/SPARK-17938
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 2.0.0, 2.0.1
> Reporter: Samy Dindane
>
> spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same
> behavior with 2.0.0 though.
> spark.streaming.kafka.consumer.poll.ms is set to 30000
> spark.streaming.kafka.maxRatePerPartition is set to 100000
> spark.streaming.backpressure.enabled is set to true
> `batchDuration` of the streaming context is set to 1 second.
> I consume a Kafka topic using KafkaUtils.createDirectStream().
> My system can handle 100k records batches, but it'd take more than 1 seconds
> to process them all. I'd thus expect the backpressure to reduce the number of
> records that would be fetched in the next batch to keep the processing delay
> inferior to 1 second.
> Only this does not happen and the rate of the backpressure stays the same:
> stuck in `100.0`, no matter how the other variables change (processing time,
> error, etc.).
> Here's a log showing how all these variables change but the chosen rate stays
> the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I
> would have attached a file but I don't see how).
> Is this the expected behavior and I am missing something, or is this a bug?
> I'll gladly help by providing more information or writing code if necessary.
> Thank you.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]