[jira] [Created] (HIVE-25856) Intermittent null ordering in plans of queries with GROUP BY and LIMIT

Stamatis Zampetakis (Jira) Mon, 10 Jan 2022 07:42:27 -0800

Stamatis Zampetakis created HIVE-25856:
------------------------------------------


             Summary: Intermittent null ordering in plans of queries with GROUP 
BY and LIMIT
                 Key: HIVE-25856
                 URL: https://issues.apache.org/jira/browse/HIVE-25856
             Project: Hive
          Issue Type: Bug
          Components: CBO
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


{code:sql}
CREATE TABLE person (id INTEGER, country STRING);
EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
{code}

The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls) 
from one execution to another.

{noformat}
CBO PLAN:
HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5])
  HiveProject(country=[$0], $f1=[$1])
    HiveAggregate(group=[{1}], agg#0=[count()])
      HiveTableScan(table=[[default, person]], table:alias=[person])
{noformat}

{noformat}
CBO PLAN:
HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5])
  HiveProject(country=[$0], $f1=[$1])
    HiveAggregate(group=[{1}], agg#0=[count()])
      HiveTableScan(table=[[default, person]], table:alias=[person])
{noformat}

This is unlikely to cause wrong results cause most aggregate functions (not 
all) do not return nulls thus null ordering doesn't matter much but it can lead 
to other problems such as:
* intermittent CI failures
* query/plan caching

I bumped into this problem after investigating test failures in CI. The 
following query in 
[offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q]
 returns different plan when it runs individually and when it runs along with 
some other qtest files.

{code:sql}
explain
select * from
(select key, count(1) from src group by key order by key limit 10,20) subq
join
(select key, count(1) from src group by key limit 20,20) subq2
on subq.key=subq2.key limit 3,5;
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25856) Intermittent null ordering in plans of queries with GROUP BY and LIMIT

Reply via email to