Paul Rogers created IMPALA-7927:
-----------------------------------

             Summary: Enhance Rewritten SQL in test files
                 Key: IMPALA-7927
                 URL: https://issues.apache.org/jira/browse/IMPALA-7927
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


A recent enhancement added a helpful feature: showing the rewritten SQL as part 
of the plan included in test cases such as PlannerTest. The rewritten SQL 
visualizes transforms that previously were hidden from view.

The current version can be made even more useful by addressing a few 
limitations.

h4. SELECT clause shows rewritten SQL, but HAVING does not

Due to nuances of the code, the SELECT statement maintains two copies of the 
HAVING clause: one before rewrites and alias substitutions, the other after. 
The toSQL() method always uses the before-rewrites version, even with the SQL 
options are set to show rewritten SQL. In this case, enhance the code to show 
the rewritten version instead.

Example of current version:

{code:sql}
HAVING CAST(1048576 AS BIGINT) * count(*) % CAST(2 AS
BIGINT) = CAST(0 AS BIGINT)
{code}

Example of enhanced version:

{code:sql}
HAVING CAST(1048576 AS BIGINT) * zeroifnull(count(*)) %
CAST(2 AS BIGINT) = CAST(0 AS BIGINT)
{code}

Notice that the enhanced version shows the actual rewritten SQL, including the 
{{zeroifnull()}} method inserted by the analyzer.

h4. Show column references directly, not as expressions

The analyzer performs a pass over the HAVING (and other) clauses to replace 
expressions in HAVING with references to the same expression materialized in 
the result set.

At the same time, Impala creates labels for result set columns that show the 
expression:

{code:sql}
SELECT id + 2 FROM alltypestiny;
{code}

{noformat}
+--------+
| id + 2 |
+--------+
| 2      |
…
{noformat} 

Combine these two, and the rewritten SQL shown in the plan is misleading: 
column references are replaced with the label, which is defined as the 
unrewitten SQL without implicit casts. Original query:

{code:sql}
select sum(1 + 1 + id)
from functional.alltypes
group by timestamp_col = cast('2015-11-15' as timestamp) + interval 1 year
having 1024 * 1024 * count(*) % 2 = 0
  and (sum(1 + 1 + id) > 1 or sum(1 + 1 + id) > 1)
  and (sum(1 + 1 + id) between 5 and 10)
{code}

Rewritten form shown in the test file:

{code:sql}
Analyzed query:
SELECT sum(CAST(2 AS BIGINT) + CAST(id AS BIGINT))
FROM functional.alltypes
GROUP BY timestamp_col = TIMESTAMP '2016-11-15 00:00:00'
HAVING CAST(1048576 AS BIGINT) * count(*) % CAST(2 AS BIGINT) = CAST(0 AS
BIGINT)
AND sum(2 + id) > CAST(1 AS BIGINT)
AND sum(2 + id) >= CAST(5 AS BIGINT)
AND sum(2 + id) <= CAST(10 AS BIGINT)
{code}

Note that the original {{sum(1 + 1 + id)}} is shown rewritten in the {{SELECT}} 
clause. Note that references to that expression in {{HAVING}} have been 
rewritten and are shown with the label, which is the rewritten form, but 
without implicit casts: {{sum(2 + id)}}. Someone versed in the art will 
identify these as column references, but it is a bit hard for newbies.

Better would be to clearly show such references, perhaps using a modified Hive 
syntax, with the expression as a comment:

{code:sql}
HAVING CAST(1048576 AS BIGINT) * _o2 /* count(*) */ % CAST(2 AS BIGINT) = CAST(0
AS BIGINT)
AND _o1 /* sum(2 + id) */ > CAST(1 AS BIGINT)
AND _o1 /* sum(2 + id) */ >= CAST(5 AS BIGINT)
AND _o1 /* sum(2 + id) */ <= CAST(10 AS BIGINT)
{code}

Here, {{_oi}} refers to the output row, leaving the Hive labels, {{_ci}} to 
refer to the select items.

Note that these changes apply *only* when using the option to produce rewritten 
SQL, which is used only when generating the “Analyzed Query” section of the 
test file. Users will never see this change. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to