Aayushmaan Jain created SPARK-27342:
---------------------------------------

             Summary: Optimize limit 0 queries
                 Key: SPARK-27342
                 URL: https://issues.apache.org/jira/browse/SPARK-27342
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Aayushmaan Jain


Limit 0 queries do not require (table) files to be scanned. However, currently, 
for queries containing a Limit 0 clause, the tables are read entirely. This 
leads to needless file I/O(s) and network transmission(s).

For instance:
 * *Select Query*

+QUERY:+ 
{code:java}
SELECT * FROM table1 LIMIT 0
{code}
+OPTIMIZED PLAN:+
{code:java}
GlobalLimit 0
+- LocalLimit 0
   +- Relation[id#79,num1#80] parquet{code}
 * *Inner Join Query*

+QUERY:+
{code:java}
SELECT * FROM table1 INNER JOIN (SELECT * FROM table2 LIMIT 0) AS table2 ON 
table1.id = table2.id{code}
+OPTIMIZED PLAN:+
{code:java}
Join Inner, (id#79 = id#87)
:- Filter isnotnull(id#79)
:  +- Relation[id#79,num1#80] parquet
+- Filter isnotnull(id#87)
   +- GlobalLimit 0
      +- LocalLimit 0
         +- Relation[id#87,num2#88] parquet{code}
 

 

Similarly, other operations such as Left Outer Join, Right Outer Join, 
Intersect, etc. are also affected.

As a result, this causes the Spark application to waste execution time and 
network bandwidth.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to