[ 
https://issues.apache.org/jira/browse/SPARK-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487323#comment-14487323
 ] 

Saisai Shao commented on SPARK-6691:
------------------------------------

Hi [~tdas], thanks a lot for your comments. I think this proposal not only add 
a new dynamic RateLimiter, but also refactor the previous interface to offer a 
uniform solution for receiver based input stream as well as direct input 
stream. Also it keeps the backward compatible, the default logic keeps the same 
as previous code. So I think it make sense to do such refactor.

Regarding to dynamic RateLimiter, I think your suggestion is quite meaningful, 
we need a good design to balance the stability and throughput, currently my 
design is simple and straightforward, we still need to polish it. But from my 
point stability is very important for in-production use compared to throughput, 
since seldom in-produce use will saturate the network bandwidth of each 
receiver, but unstable is quite critical. Currently Spark Streaming is 
vulnerable to processing delay, and this processing delay will be accumulated 
and hard to recover once we met the ingestion burst, it is quite normal in 
production environment, especially for online service. So dynamic RateLimiter 
could well solve this problem, from this point it is quite meaningful.

My design of dynamic RateLimiter may not be so sophisticated, I think I bring 
it here is just to show a possible solution to handle the issues, so we could 
improve this. I will continue to do some benchmark and research works on 
different scenarios. Thanks a lot for your suggestions.

> Abstract and add a dynamic RateLimiter for Spark Streaming
> ----------------------------------------------------------
>
>                 Key: SPARK-6691
>                 URL: https://issues.apache.org/jira/browse/SPARK-6691
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.3.0
>            Reporter: Saisai Shao
>
> Flow control (or rate control) for input data is very important in streaming 
> system, especially for Spark Streaming to keep stable and up-to-date. The 
> unexpected flood of incoming data or the high ingestion rate of input data 
> which beyond the computation power of cluster will make the system unstable 
> and increase the delay time. For Spark Streaming’s job generation and 
> processing pattern, this delay will be accumulated and introduce unacceptable 
> exceptions.
> ----
> Currently in Spark Streaming’s receiver based input stream, there’s a 
> RateLimiter in BlockGenerator which controls the ingestion rate of input 
> data, but the current implementation has several limitations:
> # The max ingestion rate is set by user through configuration beforehand, 
> user may lack the experience of how to set an appropriate value before the 
> application is running.
> # This configuration is fixed through the life-time of application, which 
> means you need to consider the worst scenario to set a reasonable 
> configuration.
> # Input stream like DirectKafkaInputStream need to maintain another solution 
> to achieve the same functionality.
> # Lack of slow start control makes the whole system easily trapped into large 
> processing and scheduling delay at the very beginning.
> ----
> So here we propose a new dynamic RateLimiter as well as the new interface for 
> the RateLimiter to better improve the whole system's stability. The target is:
> * Dynamically adjust the ingestion rate according to processing rate of 
> previous finished jobs.
> * Offer an uniform solution not only for receiver based input stream, but 
> also for direct stream like DirectKafkaInputStream and new ones.
> * Slow start rate to control the network congestion when job is started.
> * Pluggable framework to make the maintenance of extension more easy.
> ----
> Here is the design doc 
> (https://docs.google.com/document/d/1lqJDkOYDh_9hRLQRwqvBXcbLScWPmMa7MlG8J_TE93w/edit?usp=sharing)
>  and working branch 
> (https://github.com/jerryshao/apache-spark/tree/dynamic-rate-limiter).
> Any comment would be greatly appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to