[jira] [Commented] (SPARK-23128) A new approach to do adaptive execution in Spark SQL

Wenchen Fan (Jira) Wed, 08 Apr 2020 06:38:55 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078283#comment-17078283
 ]


Wenchen Fan commented on SPARK-23128:
-------------------------------------

Yes, they are.
https://issues.apache.org/jira/browse/SPARK-28177
https://issues.apache.org/jira/browse/SPARK-29544

> A new approach to do adaptive execution in Spark SQL
> ----------------------------------------------------
>
>                 Key: SPARK-23128
>                 URL: https://issues.apache.org/jira/browse/SPARK-23128
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Carson Wang
>            Assignee: Carson Wang
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: AdaptiveExecutioninBaidu.pdf
>
>
> SPARK-9850 proposed the basic idea of adaptive execution in Spark. In 
> DAGScheduler, a new API is added to support submitting a single map stage.  
> The current implementation of adaptive execution in Spark SQL supports 
> changing the reducer number at runtime. An Exchange coordinator is used to 
> determine the number of post-shuffle partitions for a stage that needs to 
> fetch shuffle data from one or multiple stages. The current implementation 
> adds ExchangeCoordinator while we are adding Exchanges. However there are 
> some limitations. First, it may cause additional shuffles that may decrease 
> the performance. We can see this from EnsureRequirements rule when it adds 
> ExchangeCoordinator.  Secondly, it is not a good idea to add 
> ExchangeCoordinators while we are adding Exchanges because we don’t have a 
> global picture of all shuffle dependencies of a post-shuffle stage. I.e. for 
> 3 tables’ join in a single stage, the same ExchangeCoordinator should be used 
> in three Exchanges but currently two separated ExchangeCoordinator will be 
> added. Thirdly, with the current framework it is not easy to implement other 
> features in adaptive execution flexibly like changing the execution plan and 
> handling skewed join at runtime.
> We'd like to introduce a new way to do adaptive execution in Spark SQL and 
> address the limitations. The idea is described at 
> [https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23128) A new approach to do adaptive execution in Spark SQL

Reply via email to