[jira] [Commented] (DRILL-6574) Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

ASF GitHub Bot (JIRA) Thu, 19 Jul 2018 01:08:32 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548961#comment-16548961
 ]


ASF GitHub Bot commented on DRILL-6574:
---------------------------------------

KazydubB commented on a change in pull request #1386: DRILL-6574: Add option to 
push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203438425
 
 

 ##########
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLateLimit0Optimization.java
 ##########
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.limit;
+
+import org.apache.drill.test.BaseTestQuery;
+import org.apache.drill.PlanTestBase;
+import org.junit.Test;
+
+public class TestLateLimit0Optimization extends BaseTestQuery {
+
+  @Test
+  public void convertFromJson() throws Exception {
+    checkThatQueryIsNotOptimized("SELECT CONVERT_FROM('{x:100, y:215.6}' 
,'JSON') AS MYCOL FROM (VALUES(1))");
+  }
+
+  private static void checkThatQueryIsNotOptimized(final String query) throws 
Exception {
+    PlanTestBase.testPlanMatchingPatterns(wrapLimit0(query),
+        new String[]{},
+        new String[]{
+            ".*Limit\\(offset=\\[0\\], fetch=\\[0\\]\\)(.*[\n\r])+.*Scan.*"
+        });
+  }
+
+  private static String wrapLimit0(final String query) {
+    return "SELECT * FROM (" + query + ") LZT LIMIT 0";
+  }
+
+  @Test
+  public void convertToIntBE() throws Exception {
+    checkThatQueryIsOptimized("SELECT CONVERT_TO(r_regionkey, 'INT_BE') FROM 
cp.`tpch/region.parquet`");
+  }
+
+  private static void checkThatQueryIsOptimized(final String query) throws 
Exception {
+    PlanTestBase.testPlanMatchingPatterns(wrapLimit0(query),
 
 Review comment:
   Done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)
> ----------------------------------------------------------------------
>
>                 Key: DRILL-6574
>                 URL: https://issues.apache.org/jira/browse/DRILL-6574
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Bohdan Kazydub
>            Assignee: Bohdan Kazydub
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.14.0
>
>
> Currently we have early limit 0 optimization 
> (planner.enable_limit0_optimization) which determines query data types before 
> actual scan. Since we not always able to determine data type during planning, 
> we need to add one more option to enable late limit 0 optimization 
> (planner.enable_limit0_on_scan, exit query right after scan. LIMIT(0) on SCAN 
> for UNION and complex functions will be disabled i.e. UNION and complex 
> functions need data to produce result schema. This would not work for the 
> following list of functions: KVGEN, MAPPIFY, FLATTEN, CONVERT_FROMJSON, 
> CONVERT_TOJSON, CONVERT_TOSIMPLEJSON, CONVERT_TOEXTENDEDJSON.
> Query plan examples:
> For query
> {code:java}
> SELECT * FROM (
>   SELECT l.l_quantity, l.l_shipdate, o.o_custkey 
>   FROM cp.`tpch/lineitem.parquet` l 
>   JOIN cp.`tpch/orders.parquet` o ON l.l_orderkey = o.o_orderkey 
>   LIMIT 2) 
> LIMIT 0
> {code}
> {color:#6a8759}plan after changes looks like{color}
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY 
> o_custkey): rowcount = 1.0, cumulative cost = \{75183.1 rows, 210559.1 cpu, 
> 0.0 io, 0.0 network, 17.6 memory}, id = 527
> 00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : 
> rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount 
> = 1.0, cumulative cost = \{75183.0 rows, 210559.0 cpu, 0.0 io, 0.0 network, 
> 17.6 memory}, id = 526
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, 
> ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 
> 1.0, cumulative cost = \{75182.0 rows, 210556.0 cpu, 0.0 io, 0.0 network, 
> 17.6 memory}, id = 525
> 00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, 
> cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 
> memory}, id = 524
> 00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, 
> cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 
> memory}, id = 523
> 00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY 
> o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75179.0 rows, 
> 210547.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 522
> 00-07                SelectionVectorRemover : rowType = RecordType(ANY 
> l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost 
> = \{60176.0 rows, 180526.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 518
> 00-09                  Limit(offset=[0], fetch=[0]) : rowType = 
> RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, 
> cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 517
> 00-11                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) 
> : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): 
> rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 516
> 00-06                SelectionVectorRemover : rowType = RecordType(ANY 
> o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15001.0 rows, 
> 30001.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 521
> 00-08                  Limit(offset=[0], fetch=[0]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = 
> \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520
> 00-10                    Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative 
> cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 519
> {noformat}
> and before changes:
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY 
> o_custkey): rowcount = 1.0, cumulative cost = \{150354.1 rows, 1052637.1 cpu, 
> 0.0 io, 0.0 network, 264000.0 memory}, id = 452
> 00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : 
> rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount 
> = 1.0, cumulative cost = \{150354.0 rows, 1052637.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 451
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, 
> ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 
> 1.0, cumulative cost = \{150353.0 rows, 1052634.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 450
> 00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, 
> cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 449
> 00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY 
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, 
> cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 
> 264000.0 memory}, id = 448
> 00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : 
> rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY 
> o_orderkey, ANY o_custkey): rowcount = 60175.0, cumulative cost = \{150350.0 
> rows, 1052625.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 447
> 00-07                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) 
> : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): 
> rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 
> 0.0 network, 0.0 memory}, id = 445
> 00-06                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], 
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = 
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative 
> cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
> {noformat}
> Also both early and late limit 0 optimizations will be enabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6574) Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

Reply via email to