[
https://issues.apache.org/jira/browse/IMPALA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210505#comment-17210505
]
Tim Armstrong commented on IMPALA-10229:
----------------------------------------
A standalone example. It's easier to reproduce with num_nodes=1 btw, since the
distributed top-n tends to filter out fewer rows.
{noformat}
create table ranktbl(part int, ord int) stored as parquet;
insert into ranktbl values (1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2,
3), (2, 4);
set num_nodes=1;
select * from (
select *, rank() over (partition by part order by ord) rnk from ranktbl) v
where rnk < 5
order by part, ord desc
limit 5;
+------+-----+-----+
| part | ord | rnk |
+------+-----+-----+
| 1 | 4 | 4 |
| 1 | 3 | 3 |
| 1 | 2 | 2 |
| 1 | 1 | 1 |
| 2 | 1 | 1 |
+------+-----+-----+
select * from (select *, rank() over (partition by part order by ord) rnk from
ranktbl) v order by part, ord desc limit 2;
+------+-----+-----+
| part | ord | rnk |
+------+-----+-----+
| 1 | 2 | 2 |
| 1 | 1 | 1 |
+------+-----+-----+
{noformat}
I'm thinking this through, but I think we need some stronger preconditions for
the optimization.
* Case 1: the sort expressions fully match.
In this case the row order is the same between the output of the top-n and the
input of the analytic. So the first N rows returned from the top-n will be the
same rows as the output of the analytic sort *if* there are no rows filtered
out by a SELECT node in-between. So we need to ensure that the SELECT
predicates are always true. This would be true for, say, rank() <= 1000 if
there was a limit 100, since it will definitely be true for the first 100 rows.
* Case 2: only the partition exprs match
In this case the rows returned are going to come from the first partitions,
i.e. if the limit is 100, if we go through the partitions in order until the
row count adds up to 100, then we know that the rows much come from those
partitions.
The issue here is that, for the final partition, if the sort order is
different, then the rows returned from the top-n may not be the first rows in
the partition for the analytic. So if we filter out the first rows in the
partition before they are fed into the analytic, the results of the analytic
function can change.
One solution would be to increase the pushed down limit so that it was always
guaranteed to include all of the rows in the final partition to be returned.
E.g. if you had a row_number() <= 100 predicate and limit 100, if you pushed
down limit 200, then you'd be guaranteed to capture all of the rows in the
final partition. That may be trickier when it comes to rank() where you need to
deal with ties (since the top-n operator doesn't currently handle ties).
> Analytic limit pushdown optimization can be applied incorrect when there are
> no analytic predicates
> ---------------------------------------------------------------------------------------------------
>
> Key: IMPALA-10229
> URL: https://issues.apache.org/jira/browse/IMPALA-10229
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Blocker
> Labels: correctness
>
> {noformat}
> [localhost.EXAMPLE.COM:21050] default> select * from (select month, id,
> rank() over (partition by month order by id desc) rnk from
> functional_parquet.alltypes WHERE month >= 11) v order by month, id limit 3;
> +-------+------+-----+
> | month | id | rnk |
> +-------+------+-----+
> | 11 | 6987 | 3 |
> | 11 | 6988 | 2 |
> | 11 | 6989 | 1 |
> +-------+------+-----+
> Fetched 3 row(s) in 4.16s
> {noformat}
> These are not the top 3 rows when ordering by month, id . Hive's result is
> correct:
> {noformat}
> +----------+-------+--------+
> | v.month | v.id | v.rnk |
> +----------+-------+--------+
> | 11 | 3040 | 600 |
> | 11 | 3041 | 599 |
> | 11 | 3042 | 598 |
> +----------+-------+--------+
> {noformat}
> I think when there's no select predicates, that the ordering in the analytic
> sort needs to exactly match the TOP N sort ordering. I'm not sure if there
> are fixes needed for the case where there are select predicates.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]