Zhenhua Wang created SPARK-22662:
------------------------------------
Summary: Failed to prune columns after rewriting predicate subquery
Key: SPARK-22662
URL: https://issues.apache.org/jira/browse/SPARK-22662
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.3.0
Reporter: Zhenhua Wang
As a simple example:
{code}
spark-sql> create table base (a int, b int) using parquet;
Time taken: 0.066 seconds
spark-sql> create table relInSubq ( x int, y int, z int) using parquet;
Time taken: 0.042 seconds
spark-sql> explain select a from base where a in (select x from relInSubq);
== Physical Plan ==
*Project [a#83]
+- *BroadcastHashJoin [a#83], [x#85], LeftSemi, BuildRight
:- *FileScan parquet default.base[a#83,b#84] Batched: true, Format: Parquet,
Location: InMemoryFileIndex[hdfs://100.0.0.4:9000/wzh/base], PartitionFilters:
[], PushedFilters: [], ReadSchema: struct<a:int,b:int>
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int,
true] as bigint)))
+- *Project [x#85]
+- *FileScan parquet default.relinsubq[x#85] Batched: true, Format:
Parquet, Location: InMemoryFileIndex[hdfs://100.0.0.4:9000/wzh/relinsubq],
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<x:int>
{code}
We only need column `a` in table `base`, but all columns (`a`, `b`) are fetched.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]