[ 
https://issues.apache.org/jira/browse/SPARK-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-12616:
--------------------------------
    Description: 
Union logical plan is a binary node. However, a typical use case for union is 
to union a very large number of input sources (DataFrames, RDDs, or files). It 
is not uncommon to union hundreds of thousands of files. In this case, our 
optimizer can become very slow due to the large number of logical unions. We 
should change the Union logical plan to support an arbitrary number of 
children, and add a single rule in the optimizer (or analyzer?) to collapse all 
adjacent Unions into one.

Note that this problem doesn't exist in physical plan, because the physical 
Union already supports arbitrary number of children.




  was:
Union logical plan is a binary node. However, a typical use case for union is 
to union a very large number of input sources (DataFrames, RDDs, or files). In 
this case, our optimizer can become very slow due to the large number of 
logical unions. We should change the Union logical plan to support an arbitrary 
number of children, and add a single rule in the optimizer (or analyzer?) to 
collapse all adjacent Unions into one.

Note that this problem doesn't exist in physical plan, because the physical 
Union already supports arbitrary number of children.





> Union logical plan should support arbitrary number of children (rather than 
> binary)
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-12616
>                 URL: https://issues.apache.org/jira/browse/SPARK-12616
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Reynold Xin
>
> Union logical plan is a binary node. However, a typical use case for union is 
> to union a very large number of input sources (DataFrames, RDDs, or files). 
> It is not uncommon to union hundreds of thousands of files. In this case, our 
> optimizer can become very slow due to the large number of logical unions. We 
> should change the Union logical plan to support an arbitrary number of 
> children, and add a single rule in the optimizer (or analyzer?) to collapse 
> all adjacent Unions into one.
> Note that this problem doesn't exist in physical plan, because the physical 
> Union already supports arbitrary number of children.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to