[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822722#comment-17822722 ]
Pavan Kotikalapudi commented on SPARK-24815: -------------------------------------------- Thanks a lot for mentoring and driving this effort Mich. As you suggested I will update the benefits and challenges in the SPIP doc. That can outline the scope of the current work and possibility of any future work for other use cases. Re: > Pluggable Dynamic Allocation , Separate Algorithm for Structured Streaming I really like the idea. I started off with that but limited it to only core module as it serves at primitive level of evaluation (that current dra is already doing). but this idea is better as you said design wise and also for different kinds of workloads. > Warning for Enabled Core Dynamic Allocation Right now we need normal DRA because structured streaming DRA is built on top of it. I have added another flag `spark.dynamicAllocation.streaming.enabled` so that this particular pieces of streaming algo would kick in on top of traditional DRA. This approach also makes it backwards compatible especially when users have to upgrade spark. > Structured Streaming should support dynamic allocation > ------------------------------------------------------ > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core, Structured Streaming > Affects Versions: 2.3.1 > Reporter: Karthik Palaniappan > Priority: Minor > Labels: pull-request-available > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org