[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

Pavan Kotikalapudi (Jira) Fri, 01 Mar 2024 14:31:16 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822722#comment-17822722
 ]


Pavan Kotikalapudi commented on SPARK-24815:
--------------------------------------------

Thanks a lot for mentoring and driving this effort Mich.

As you suggested I will update the benefits and challenges in the SPIP doc. 
That can outline the scope of the current work and possibility of any future 
work for other use cases.

 

Re:  

> Pluggable Dynamic Allocation , Separate Algorithm for Structured Streaming

I really like the idea. I started off with that but limited it to only core 
module as it serves at primitive level of evaluation (that current dra is 
already doing). but this idea is better as you said design wise and also for 
different kinds of workloads.

 

> Warning for Enabled Core Dynamic Allocation

Right now we need normal DRA because structured streaming DRA is built on top 
of it. I have added another flag `spark.dynamicAllocation.streaming.enabled` so 
that this particular pieces of streaming algo would kick in on top of 
traditional DRA. This approach also makes it backwards compatible especially 
when users have to upgrade spark.

 

> Structured Streaming should support dynamic allocation
> ------------------------------------------------------
>
>                 Key: SPARK-24815
>                 URL: https://issues.apache.org/jira/browse/SPARK-24815
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Spark Core, Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Karthik Palaniappan
>            Priority: Minor
>              Labels: pull-request-available
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

Reply via email to