[
https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940405#comment-16940405
]
Sebastian Arzt commented on SPARK-18371:
----------------------------------------
[~rkarthikeyan] at a first glace I cannot find back pressure support in the
kinesis receiver yet. I think your problem should be investigated
independently. I suggest to create a new ticket with instructions to reproduce
your findings.
> Spark Streaming backpressure bug - generates a batch with large number of
> records
> ---------------------------------------------------------------------------------
>
> Key: SPARK-18371
> URL: https://issues.apache.org/jira/browse/SPARK-18371
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 2.0.0
> Reporter: mapreduced
> Assignee: Sebastian Arzt
> Priority: Major
> Fix For: 2.4.0
>
> Attachments: 01.png, 02.png, GiantBatch2.png, GiantBatch3.png,
> Giant_batch_at_23_00.png, Look_at_batch_at_22_14.png, Screen Shot 2019-09-16
> at 12.27.25 PM.png
>
>
> When the streaming job is configured with backpressureEnabled=true, it
> generates a GIANT batch of records if the processing time + scheduled delay
> is (much) larger than batchDuration. This creates a backlog of records like
> no other and results in batches queueing for hours until it chews through
> this giant batch.
> Expectation is that it should reduce the number of records per batch in some
> time to whatever it can really process.
> Attaching some screen shots where it seems that this issue is quite easily
> reproducible.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]