[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852541#comment-16852541
 ] 

Karthik Palaniappan edited comment on SPARK-24815 at 5/31/19 2:26 AM:
----------------------------------------------------------------------

I was starting to update the JIRA description with a problem statement, then 
realized I am unfamiliar with some of the challenges you guys mentioned in the 
comments, in particular how state is managed in structured streaming.

I was imagining that processing rate was the correct heuristic, assuming the 
goal is to just keep up with the input, even at the expense of processing time. 
Continuous processing seems to solve the separate case where you need ultra low 
latency processing.

[~skonto] [~kabhwan] [~gsomogyi] if you guys help with a design, I'd be happy 
to help with the implementation, but for now I will drop this JIRA.


was (Author: karthik palaniappan):
I was starting to update the JIRA description with a problem statement, then 
realized I am unfamiliar with some of the challenges you guys mentioned in the 
comments, in particular how state is managed in structured streaming.

I was imagining that processing rate was the correct heuristic, assuming the 
goal is to keep up with the input. Continuous processing seems to solve the 
separate case where you need ultra low latency.

[~skonto] [~kabhwan] [~gsomogyi] if you guys help with a design, I'd be happy 
to help with the implementation, but for now I will drop this JIRA.

> Structured Streaming should support dynamic allocation
> ------------------------------------------------------
>
>                 Key: SPARK-24815
>                 URL: https://issues.apache.org/jira/browse/SPARK-24815
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Karthik Palaniappan
>            Priority: Minor
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to