[jira] [Commented] (SPARK-21520) Improvement a special case for non-deterministic projects in optimizer

Apache Spark (JIRA) Mon, 28 Aug 2017 09:56:21 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144024#comment-16144024
 ]


Apache Spark commented on SPARK-21520:
--------------------------------------

User 'heary-cao' has created a pull request for this issue:
https://github.com/apache/spark/pull/18969

> Improvement a special case for non-deterministic projects in optimizer
> ----------------------------------------------------------------------
>
>                 Key: SPARK-21520
>                 URL: https://issues.apache.org/jira/browse/SPARK-21520
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: caoxuewen
>
> Currently, Did a lot of special handling for non-deterministic projects and 
> filters in optimizer. but not good enough. this patch add a new special case 
> for non-deterministic projects. Deal with that we only need to read user 
> needs fields for non-deterministic projects in optimizer.
> For example, the fields of project contains nondeterministic function(rand 
> function), after a executedPlan optimizer generated:
> *HashAggregate(keys=[k#403L], functions=[partial_sum(cast(id#402 as 
> bigint))], output=[k#403L, sum#800L])
> +- Project [d004#607 AS id#402, FLOOR((rand(8828525941469309371) * 10000.0)) 
> AS k#403L]
>    +- HiveTableScan [c030#606L, d004#607, d005#608, d025#609, c002#610, 
> d023#611, d024#612, c005#613L, c008#614, c009#615, c010#616, d021#617, 
> d022#618, c017#619, c018#620, c019#621, c020#622, c021#623, c022#624, 
> c023#625, c024#626, c025#627, c026#628, c027#629, ... 169 more fields], 
> MetastoreRelation XXX_database, XXX_table
> HiveTableScan will read all the fields from table. but we only need to ‘d004’ 
> . it will affect the performance of task.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-21520) Improvement a special case for non-deterministic projects in optimizer

Reply via email to