[jira] [Commented] (DRILL-5371) Large run-time overhead for nested SELECT queries

Paul Rogers (JIRA) Mon, 20 Mar 2017 16:00:07 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933765#comment-15933765
 ]


Paul Rogers commented on DRILL-5371:
------------------------------------

Example output run. Length is the nested SELECT statement length, in characters.

{code}
Length: 1129081
Results: 10 records, 1 batches, 652 ms
Length: 1334474
Results: 10 records, 1 batches, 821 ms
Length: 1539867
Results: 10 records, 1 batches, 1,189 ms
Length: 1745260
Results: 10 records, 1 batches, 1,184 ms
Length: 1950653
Results: 10 records, 1 batches, 1,263 ms
Length: 2156246
Results: 10 records, 1 batches, 1,506 ms
Length: 2362039
Results: 10 records, 1 batches, 1,914 ms
Length: 2567832
Results: 10 records, 1 batches, 2,573 ms
Length: 2773625
Results: 10 records, 1 batches, 2,420 ms
Length: 2979418
Results: 10 records, 1 batches, 2,856 ms
Length: 3185211
Results: 10 records, 1 batches, 3,179 ms
Length: 3391004
Results: 10 records, 1 batches, 3,592 ms
Length: 3596797
Results: 10 records, 1 batches, 4,434 ms
Length: 3802590
Results: 10 records, 1 batches, 5,058 ms
Length: 4008383
Results: 10 records, 1 batches, 5,713 ms
Length: 4214176
Results: 10 records, 1 batches, 6,692 ms
Length: 4419969
Results: 10 records, 1 batches, 7,944 ms
Length: 4625762
Results: 10 records, 1 batches, 8,510 ms
Length: 4831555
{code}

> Large run-time overhead for nested SELECT queries
> -------------------------------------------------
>
>                 Key: DRILL-5371
>                 URL: https://issues.apache.org/jira/browse/DRILL-5371
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>
> See DRILL-5370 - a test in which Drill was stress-tested with nested SELECT 
> queries of ever-increasing size.
> Semantically, the query does nothing other than:
> SELECT a AS b AS c AS ... AS z FROM foo;
> The above is not valid SQL, of course, but it shows that the nested SELECTs 
> do nothing other than create static aliases for columns, and do so many times 
> via layers of nested SELECTs.
> {code}
> SELECT y AS z FROM
>     (SELECT x AS y FROM
>         (SELECT w AS x FROM ...
>                            (SELECT a FROM someTable))))...))
> {code}
> Because the nested selects do not actual processing, only impose aliases, the 
> optimizer should be able to optimize away the aliasing. That is, there should 
> be no need for any run-time work to simply change the name of a column.
> However, when run (with 200 columns, each with 500 character names, but only 
> 10 rows), the overhead in a debug build is somewhere between 1/2 and 1 second 
> per nesting.
> That is, for just 10 rows, each layer of nested SELECT adds about 1 second to 
> the execution time.
> Queries of this form may be pathological if written by humans. But, they are 
> typical of queries generated by BI tools. Hence, Drill performance for such 
> tools can be increased simply by avoiding doing unnecessary work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5371) Large run-time overhead for nested SELECT queries

Reply via email to