Michael Ho created IMPALA-6781:
----------------------------------

             Summary: Stress tests should take into account of 
non-deterministic results of queries
                 Key: IMPALA-6781
                 URL: https://issues.apache.org/jira/browse/IMPALA-6781
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 3.0, Impala 2.12.0
            Reporter: Michael Ho
            Assignee: Michael Brown


Certain queries have the pattern of ordery-by with a limit clause. Due to the 
limit clause, we may not include all the rows for a given order-by keys in the 
output. It's non-deterministic that which rows will be included in the output. 
Given the stress test will store the hashes of the query result obtained during 
binary search, we may sometimes get false alarm on result mis-match. One way to 
fix it is that we need to order all the columns so that the results will always 
be deterministic.

For instance, the following TPC-H queries may have multiple correct results:
{noformat}
    Default Db: tpch_10000_parquet
    Sql Statement: /* Mem: 8152 MB. Coordinator: vc1503.halxg.cloudera.com. */
select
  l_orderkey,
  sum(l_extendedprice * (1 - l_discount)) as revenue,
  o_orderdate,
  o_shippriority
from
  customer,
  orders,
  lineitem
where
  c_mktsegment = 'BUILDING'
  and c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate < '1995-03-15'
  and l_shipdate > '1995-03-15'
group by
  l_orderkey,
  o_orderdate,
  o_shippriority
order by
  revenue desc,
  o_orderdate
limit 10
{noformat}
Correct result 1:
{noformat}
+-------------+-------------+-------------+----------------+
| l_orderkey  | revenue     | o_orderdate | o_shippriority |
+-------------+-------------+-------------+----------------+
| 18134335556 | 510414.0360 | 1995-02-13  | 0              |
| 52494073892 | 510414.0360 | 1995-02-13  | 0              |
| 26724270146 | 510414.0360 | 1995-02-13  | 0              |
| 954466400   | 510414.0360 | 1995-02-13  | 0              |
| 9544400966  | 510414.0360 | 1995-02-13  | 0              |
| 35314204736 | 510414.0360 | 1995-02-13  | 0              |
| 43904139302 | 510414.0360 | 1995-02-13  | 0              |
| 7133834945  | 503732.3318 | 1995-02-25  | 0              |
| 24313704101 | 503732.3318 | 1995-02-25  | 0              |
| 58673442437 | 503732.3318 | 1995-02-25  | 0              |
+-------------+-------------+-------------+----------------+
{noformat}
Correct result 2:
{noformat}
+-------------+-------------+-------------+----------------+
| l_orderkey  | revenue     | o_orderdate | o_shippriority |
+-------------+-------------+-------------+----------------+
| 26724270146 | 510414.0360 | 1995-02-13  | 0              |
| 43904139302 | 510414.0360 | 1995-02-13  | 0              |
| 35314204736 | 510414.0360 | 1995-02-13  | 0              |
| 52494073892 | 510414.0360 | 1995-02-13  | 0              |
| 18134335556 | 510414.0360 | 1995-02-13  | 0              |
| 9544400966  | 510414.0360 | 1995-02-13  | 0              |
| 954466400   | 510414.0360 | 1995-02-13  | 0              |
| 32903638691 | 503732.3318 | 1995-02-25  | 0              |
| 15723769511 | 503732.3318 | 1995-02-25  | 0              |
| 50083507847 | 503732.3318 | 1995-02-25  | 0              |
+-------------+-------------+-------------+----------------+
{noformat}
If one changes the limit clause to 15, we will see that the last 3 rows should 
be a subset of rows with (503732.3318, 1995-02-25) as the ordering keys 
{{revenue}} and {{o_orderdate}} are identical so it's non-deterministic which 
rows are shown. The result will be deterministic if the order-by clause also 
include {{l_orderkey}} and {{o_shippriority}}
{noformat}
+-------------+-------------+-------------+----------------+
| l_orderkey  | revenue     | o_orderdate | o_shippriority |
+-------------+-------------+-------------+----------------+
| 26724270146 | 510414.0360 | 1995-02-13  | 0              |
| 43904139302 | 510414.0360 | 1995-02-13  | 0              |
| 35314204736 | 510414.0360 | 1995-02-13  | 0              |
| 52494073892 | 510414.0360 | 1995-02-13  | 0              |
| 18134335556 | 510414.0360 | 1995-02-13  | 0              |
| 9544400966  | 510414.0360 | 1995-02-13  | 0              |
| 954466400   | 510414.0360 | 1995-02-13  | 0              |
| 32903638691 | 503732.3318 | 1995-02-25  | 0              |
| 15723769511 | 503732.3318 | 1995-02-25  | 0              |
| 50083507847 | 503732.3318 | 1995-02-25  | 0              |
| 7133834945  | 503732.3318 | 1995-02-25  | 0              |
| 41493573281 | 503732.3318 | 1995-02-25  | 0              |
| 24313704101 | 503732.3318 | 1995-02-25  | 0              |
| 58673442437 | 503732.3318 | 1995-02-25  | 0              |
| 54409232481 | 499104.0645 | 1995-03-10  | 0              |
+-------------+-------------+-------------+----------------+
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to