[ 
https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578133#comment-15578133
 ] 

Cody Koeninger commented on SPARK-17938:
----------------------------------------

There was pretty extensive discussion of this on list, should link or summarize 
it.

Couple of things here:

 100 is the default minimum rate for pidestimator.  If you're willing to write 
code, put more logging in to determine why that rate isn't being configured, or 
hardcode it to a different number. I have successfully adjusted that rate using 
spark configuration.

The other thing is that if your system takes way longer than 1 second to 
process 100k records, 100k obviously isn't a reasonable max. Many large batches 
will be defined during the time that first batch is running, before back 
pressure is involved at all. Try a lower max.

> Backpressure rate not adjusting
> -------------------------------
>
>                 Key: SPARK-17938
>                 URL: https://issues.apache.org/jira/browse/SPARK-17938
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Samy Dindane
>
> spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same 
> behavior with 2.0.0 though.
> spark.streaming.kafka.consumer.poll.ms is set to 30000
> spark.streaming.kafka.maxRatePerPartition is set to 100000
> spark.streaming.backpressure.enabled is set to true
> `batchDuration` of the streaming context is set to 1 second.
> I consume a Kafka topic using KafkaUtils.createDirectStream().
> My system can handle 100k records batches, but it'd take more than 1 seconds 
> to process them all. I'd thus expect the backpressure to reduce the number of 
> records that would be fetched in the next batch to keep the processing delay 
> inferior to 1 second.
> Only this does not happen and the rate of the backpressure stays the same: 
> stuck in `100.0`, no matter how the other variables change (processing time, 
> error, etc.).
> Here's a log showing how all these variables change but the chosen rate stays 
> the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I 
> would have attached a file but I don't see how).
> Is this the expected behavior and I am missing something, or is this  a bug?
> I'll gladly help by providing more information or writing code if necessary.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to