[
https://issues.apache.org/jira/browse/DRILL-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bridget Bevens updated DRILL-6574:
----------------------------------
Labels: doc-complete ready-to-commit (was: doc-impacting ready-to-commit)
> Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)
> ----------------------------------------------------------------------
>
> Key: DRILL-6574
> URL: https://issues.apache.org/jira/browse/DRILL-6574
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.13.0
> Reporter: Bohdan Kazydub
> Assignee: Bohdan Kazydub
> Priority: Major
> Labels: doc-complete, ready-to-commit
> Fix For: 1.14.0
>
>
> Currently we have early limit 0 optimization
> (planner.enable_limit0_optimization) which determines query data types before
> actual scan. Since we not always able to determine data type during planning,
> we need to add one more option to enable late limit 0 optimization
> (planner.enable_limit0_on_scan, exit query right after scan. LIMIT(0) on SCAN
> for UNION and complex functions will be disabled i.e. UNION and complex
> functions need data to produce result schema. This would not work for the
> following list of functions: KVGEN, MAPPIFY, FLATTEN, CONVERT_FROMJSON,
> CONVERT_TOJSON, CONVERT_TOSIMPLEJSON, CONVERT_TOEXTENDEDJSON.
> Query plan examples:
> For query
> {code:java}
> SELECT * FROM (
> SELECT l.l_quantity, l.l_shipdate, o.o_custkey
> FROM cp.`tpch/lineitem.parquet` l
> JOIN cp.`tpch/orders.parquet` o ON l.l_orderkey = o.o_orderkey
> LIMIT 2)
> LIMIT 0
> {code}
> {color:#6a8759}plan after changes looks like{color}
> {noformat}
> 00-00 Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY
> o_custkey): rowcount = 1.0, cumulative cost = \{75183.1 rows, 210559.1 cpu,
> 0.0 io, 0.0 network, 17.6 memory}, id = 527
> 00-01 Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) :
> rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount
> = 1.0, cumulative cost = \{75183.0 rows, 210559.0 cpu, 0.0 io, 0.0 network,
> 17.6 memory}, id = 526
> 00-02 SelectionVectorRemover : rowType = RecordType(ANY l_orderkey,
> ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount =
> 1.0, cumulative cost = \{75182.0 rows, 210556.0 cpu, 0.0 io, 0.0 network,
> 17.6 memory}, id = 525
> 00-03 Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0,
> cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6
> memory}, id = 524
> 00-04 Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0,
> cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6
> memory}, id = 523
> 00-05 HashJoin(condition=[=($0, $3)], joinType=[inner]) :
> rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY
> o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75179.0 rows,
> 210547.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 522
> 00-07 SelectionVectorRemover : rowType = RecordType(ANY
> l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost
> = \{60176.0 rows, 180526.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 518
> 00-09 Limit(offset=[0], fetch=[0]) : rowType =
> RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0,
> cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 517
> 00-11 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]],
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1,
> usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]])
> : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate):
> rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io,
> 0.0 network, 0.0 memory}, id = 516
> 00-06 SelectionVectorRemover : rowType = RecordType(ANY
> o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15001.0 rows,
> 30001.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 521
> 00-08 Limit(offset=[0], fetch=[0]) : rowType =
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost =
> \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520
> 00-10 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]],
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1,
> usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType =
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative
> cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 519
> {noformat}
> and before changes:
> {noformat}
> 00-00 Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY
> o_custkey): rowcount = 1.0, cumulative cost = \{150354.1 rows, 1052637.1 cpu,
> 0.0 io, 0.0 network, 264000.0 memory}, id = 452
> 00-01 Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) :
> rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount
> = 1.0, cumulative cost = \{150354.0 rows, 1052637.0 cpu, 0.0 io, 0.0 network,
> 264000.0 memory}, id = 451
> 00-02 SelectionVectorRemover : rowType = RecordType(ANY l_orderkey,
> ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount =
> 1.0, cumulative cost = \{150353.0 rows, 1052634.0 cpu, 0.0 io, 0.0 network,
> 264000.0 memory}, id = 450
> 00-03 Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0,
> cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network,
> 264000.0 memory}, id = 449
> 00-04 Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY
> l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0,
> cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network,
> 264000.0 memory}, id = 448
> 00-05 HashJoin(condition=[=($0, $3)], joinType=[inner]) :
> rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY
> o_orderkey, ANY o_custkey): rowcount = 60175.0, cumulative cost = \{150350.0
> rows, 1052625.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 447
> 00-07 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]],
> selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1,
> usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]])
> : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate):
> rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io,
> 0.0 network, 0.0 memory}, id = 445
> 00-06 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]],
> selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1,
> usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType =
> RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative
> cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
> {noformat}
> Also both early and late limit 0 optimizations will be enabled by default.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)