Paul Rogers created DRILL-5142:
----------------------------------
Summary: TestWindowFrame.testUnboundedFollowing relies on side
effects
Key: DRILL-5142
URL: https://issues.apache.org/jira/browse/DRILL-5142
Project: Apache Drill
Issue Type: Bug
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
The unit test {{TestWindowFrame.testUnboundedFollowing}} is one of a family of
tests that does the same query in two different ways, using the results of the
second to verify the first. Unfortunately, this particular tests "works" only
because it relies on undefined implementation artifacts about the way the
"verification" query is run in Drill.
Here is the query under test:
{code}
SELECT
position_id,
employee_id,
LAST_VALUE(employee_id)
OVER(PARTITION BY position_id
ORDER by employee_id
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS
`last_value`
FROM
dfs_test.`%s/window/b4.p4`
{code}
With expected results as follows:
{code}
1,0,9
...
1,9,9
{code}
Here is the "expected results" query:
{code}
SELECT
position_id,
employee_id,
MAX(employee_id) OVER(PARTITION BY position_id) AS `last_value`
FROM (
SELECT *
FROM dfs_test.`%s/window/b4.p4`
ORDER BY position_id, employee_id
)
{code}
The above happens to produce the correct results only because the query
executes in a single fragment. The query produces correct results with the
"unmanaged" external sort, but produces the following (valid) results with the
managed external sort:
{code}
1,0,9
1,2,9
...
1,9,9
1,1,9
{code}
The query relies on the inner query sort order "showing through" to the outer
query. But, if the query were distributed, the outer query would be unordered.
Hence, the verification query just happened to work, but is not actually valid.
The proper solution is to modify the verification query to move the ORDER BY to
the outer query:
{code}
...
FROM (
SELECT *
FROM dfs_test.`%s/window/b4.p4`
)
ORDER BY position_id, employee_id
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)