[
https://issues.apache.org/jira/browse/HIVE-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stamatis Zampetakis reassigned HIVE-25856:
------------------------------------------
> Intermittent null ordering in plans of queries with GROUP BY and LIMIT
> ----------------------------------------------------------------------
>
> Key: HIVE-25856
> URL: https://issues.apache.org/jira/browse/HIVE-25856
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
>
> {code:sql}
> CREATE TABLE person (id INTEGER, country STRING);
> EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
> {code}
> The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls)
> from one execution to another.
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5])
> HiveProject(country=[$0], $f1=[$1])
> HiveAggregate(group=[{1}], agg#0=[count()])
> HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5])
> HiveProject(country=[$0], $f1=[$1])
> HiveAggregate(group=[{1}], agg#0=[count()])
> HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> This is unlikely to cause wrong results cause most aggregate functions (not
> all) do not return nulls thus null ordering doesn't matter much but it can
> lead to other problems such as:
> * intermittent CI failures
> * query/plan caching
> I bumped into this problem after investigating test failures in CI. The
> following query in
> [offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q]
> returns different plan when it runs individually and when it runs along with
> some other qtest files.
> {code:sql}
> explain
> select * from
> (select key, count(1) from src group by key order by key limit 10,20) subq
> join
> (select key, count(1) from src group by key limit 20,20) subq2
> on subq.key=subq2.key limit 3,5;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)