[
https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karthikeyan Ravi updated SPARK-18371:
-------------------------------------
Attachment: Screen Shot 2019-09-16 at 12.27.25 PM.png
> Spark Streaming backpressure bug - generates a batch with large number of
> records
> ---------------------------------------------------------------------------------
>
> Key: SPARK-18371
> URL: https://issues.apache.org/jira/browse/SPARK-18371
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 2.0.0
> Reporter: mapreduced
> Assignee: Sebastian Arzt
> Priority: Major
> Fix For: 2.4.0
>
> Attachments: 01.png, 02.png, GiantBatch2.png, GiantBatch3.png,
> Giant_batch_at_23_00.png, Look_at_batch_at_22_14.png, Screen Shot 2019-09-16
> at 12.27.25 PM.png
>
>
> When the streaming job is configured with backpressureEnabled=true, it
> generates a GIANT batch of records if the processing time + scheduled delay
> is (much) larger than batchDuration. This creates a backlog of records like
> no other and results in batches queueing for hours until it chews through
> this giant batch.
> Expectation is that it should reduce the number of records per batch in some
> time to whatever it can really process.
> Attaching some screen shots where it seems that this issue is quite easily
> reproducible.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]