Zhenhua Wang created SPARK-22662:
------------------------------------

             Summary: Failed to prune columns after rewriting predicate subquery
                 Key: SPARK-22662
                 URL: https://issues.apache.org/jira/browse/SPARK-22662
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Zhenhua Wang


As a simple example:
{code}
spark-sql> create table base (a int, b int) using parquet;
Time taken: 0.066 seconds
spark-sql> create table relInSubq ( x int, y int, z int) using parquet;
Time taken: 0.042 seconds
spark-sql> explain select a from base where a in (select x from relInSubq);
== Physical Plan ==
*Project [a#83]
+- *BroadcastHashJoin [a#83], [x#85], LeftSemi, BuildRight
   :- *FileScan parquet default.base[a#83,b#84] Batched: true, Format: Parquet, 
Location: InMemoryFileIndex[hdfs://100.0.0.4:9000/wzh/base], PartitionFilters: 
[], PushedFilters: [], ReadSchema: struct<a:int,b:int>
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
true] as bigint)))
      +- *Project [x#85]
         +- *FileScan parquet default.relinsubq[x#85] Batched: true, Format: 
Parquet, Location: InMemoryFileIndex[hdfs://100.0.0.4:9000/wzh/relinsubq], 
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<x:int>
{code}

We only need column `a` in table `base`, but all columns (`a`, `b`) are fetched.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to