[
https://issues.apache.org/jira/browse/SPARK-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595374#comment-14595374
]
Tathagata Das commented on SPARK-7398:
--------------------------------------
I took a look at the whole design doc. Its very well composed, but the actual
details on how the actual code changes is a little unclear. I Now that you have
a working branch, I strongly recommend doing another additional design doc
which skips all the intro and background, and just focuses on the code changes.
Here is a design doc for inspiration. This is original design doc for the Write
Ahead Log.
https://docs.google.com/document/d/1vTCB5qVfyxQPlHuv8rit9-zjdttlgaSrMgfCDQlCJIM/edit#heading=h.9xoxtbgz551y
See the architecture and proposed implementation section. Accordingly you
should have the following two sections
1. Use diagrams to explain the high-level control flow in the architecture with
new classes in the picture and how they interoperate/interface with existing
classes (BTW, high-level = not as detailed at the control flow that you have in
the earlier design doc).
2. The details of every class and interface that needs to be introduced or
modified. Especially focus on the interfaces for - (1) the heuristic algorithm,
(2) the congestion control.
This will allow me and others to evaluate the architecture more critically.
Then if needed we can break up the task into smaller smaller sub-tasks (as done
in the case of the WAL JIRA -
https://issues.apache.org/jira/browse/SPARK-3129).
> Add back-pressure to Spark Streaming
> ------------------------------------
>
> Key: SPARK-7398
> URL: https://issues.apache.org/jira/browse/SPARK-7398
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.3.1
> Reporter: François Garillot
> Labels: streams
>
> Spark Streaming has trouble dealing with situations where
> batch processing time > batch interval
> Meaning a high throughput of input data w.r.t. Spark's ability to remove data
> from the queue.
> If this throughput is sustained for long enough, it leads to an unstable
> situation where the memory of the Receiver's Executor is overflowed.
> This aims at transmitting a back-pressure signal back to data ingestion to
> help with dealing with that high throughput, in a backwards-compatible way.
> The design doc can be found here:
> https://docs.google.com/document/d/1ZhiP_yBHcbjifz8nJEyPJpHqxB1FT6s8-Zk7sAfayQw/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]