[
https://issues.apache.org/jira/browse/SPARK-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580403#comment-14580403
]
Li Sheng commented on SPARK-8287:
---------------------------------
Sorry , see this:
scala> sql("select * from dw.src_view_1 where my_ds='2'
").queryExecution.optimizedPlan
15/06/10 19:08:59 INFO ParseDriver: Parsing command: select * from
dw.src_view_1 where my_ds='2'
15/06/10 19:08:59 INFO ParseDriver: Parse Completed
15/06/10 19:08:59 INFO HiveMetaStore: 0: get_table : db=dw tbl=src_view_1
15/06/10 19:08:59 INFO audit: ugi=shengli ip=unknown-ip-addr
cmd=get_table : db=dw tbl=src_view_1
15/06/10 19:08:59 INFO ParseDriver: Parsing command: select sum(`a`.`key`)
`s1`, sum(`b`.`key`) `s2`, `b`.`ds` `my_ds` from `dw`.`src_partitioned1` `a`
join `dw`.`src_partitioned2` `b` on `a`.`ds` = `b`.`ds` group by `b`.`ds`
15/06/10 19:08:59 INFO ParseDriver: Parse Completed
15/06/10 19:08:59 INFO HiveMetaStore: 0: get_table : db=dw tbl=src_partitioned1
15/06/10 19:08:59 INFO audit: ugi=shengli ip=unknown-ip-addr
cmd=get_table : db=dw tbl=src_partitioned1
15/06/10 19:08:59 INFO HiveMetaStore: 0: get_partitions : db=dw
tbl=src_partitioned1
15/06/10 19:08:59 INFO audit: ugi=shengli ip=unknown-ip-addr
cmd=get_partitions : db=dw tbl=src_partitioned1
15/06/10 19:08:59 INFO HiveMetaStore: 0: get_table : db=dw tbl=src_partitioned2
15/06/10 19:08:59 INFO audit: ugi=shengli ip=unknown-ip-addr
cmd=get_table : db=dw tbl=src_partitioned2
15/06/10 19:08:59 INFO HiveMetaStore: 0: get_partitions : db=dw
tbl=src_partitioned2
15/06/10 19:08:59 INFO audit: ugi=shengli ip=unknown-ip-addr
cmd=get_partitions : db=dw tbl=src_partitioned2
res254: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Filter (my_ds#918 = 2)
Aggregate [ds#922], [SUM(CAST(key#920, LongType)) AS s1#916L,SUM(CAST(key#923,
LongType)) AS s2#917L,ds#922 AS my_ds#918]
Project [ds#922,key#920,key#923]
Join Inner, Some((ds#919 = ds#922))
Project [key#920,ds#919]
MetastoreRelation dw, src_partitioned1, Some(a)
Project [ds#922,key#923]
MetastoreRelation dw, src_partitioned2, Some(b)
> Filter not push down through Subquery or View
> ---------------------------------------------
>
> Key: SPARK-8287
> URL: https://issues.apache.org/jira/browse/SPARK-8287
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.3.1
> Reporter: Li Sheng
> Fix For: 1.4.0
>
> Original Estimate: 40h
> Remaining Estimate: 40h
>
> Filter not push down through Subquery or View. Assume we have two big
> partitioned table join inner a Subquery or a View and filter not push down,
> this will cause a full partition join and will cause performance issues.
> Let me give and example that can reproduce the problem:
> {code:sql}
> create table src(key int, value string);
> -- Creates partitioned table and imports data
> CREATE TABLE src_partitioned1 (key int, value STRING) PARTITIONED BY (ds
> STRING);
> insert overwrite table src_partitioned1 PARTITION (ds='1') select key, value
> from src;
> insert overwrite table src_partitioned1 PARTITION (ds='2') select key, value
> from src;
> CREATE TABLE src_partitioned2 (key int, value STRING) PARTITIONED BY (ds
> STRING);
> insert overwrite table src_partitioned2 PARTITION (ds='1') select key, value
> from src;
> insert overwrite table src_partitioned2 PARTITION (ds='2') select key, value
> from src;
> -- Creates views
> create view src_view as select sum(a.key) s1, sum(b.key) s2, b.ds ds from
> src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds
> create view src_view_1 as select sum(a.key) s1, sum(b.key) s2, b.ds my_ds
> from src_partitioned1 a join src_partitioned2 b on a.ds = b.ds group by b.ds
> -- QueryExecution
> select * from dw.src_view where ds='2'
> {code}
> {noformat}
> sql("select * from dw.src_view where ds='2' ").queryExecution
> == Parsed Logical Plan ==
> 'Project [*]
> 'Filter ('ds = 2)
> 'UnresolvedRelation [dw,src_view], None
> == Analyzed Logical Plan ==
> Project [s1#60L,s2#61L,ds#62]
> Filter (ds#62 = 2)
> Subquery src_view
> Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67,
> LongType)) AS s2#61L,ds#66 AS ds#62]
> Join Inner, Some((ds#63 = ds#66))
> MetastoreRelation dw, src_partitioned1, Some(a)
> MetastoreRelation dw, src_partitioned2, Some(b)
> == Optimized Logical Plan ==
> Filter (ds#62 = 2)
> Aggregate [ds#66], [SUM(CAST(key#64, LongType)) AS s1#60L,SUM(CAST(key#67,
> LongType)) AS s2#61L,ds#66 AS ds#62]
> Project [ds#66,key#64,key#67]
> Join Inner, Some((ds#63 = ds#66))
> Project [key#64,ds#63]
> MetastoreRelation dw, s...
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]