[
https://issues.apache.org/jira/browse/HIVE-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-27480:
----------------------------------
Labels: pull-request-available (was: )
> OFFSET without ORDER BY generates wrong results
> -----------------------------------------------
>
> Key: HIVE-27480
> URL: https://issues.apache.org/jira/browse/HIVE-27480
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 2.4.0, 3.1.3, 4.0.0-alpha-2
> Reporter: okumin
> Assignee: okumin
> Priority: Major
> Labels: pull-request-available
>
> Without ORDER BY, Hive doesn't add an additional single reducer, and it
> pushes OFFSET and LIMIT to the original vertex. It can generate incorrect
> results because OFFSET should be counted globally unlike LIMIT.
> We would make the following changes.
> - To fix the incorrect behavior
> - To add a new `hive.strict.checks.*` to prevent this usage. Mostly, OFFSET
> without ORDER BY is meaningless
> We can reproduce the issue in the following steps.
> *Prepare test data*
> The following SQLs generate a test table with multiple files.
> {code:sql}
> $ beeline -e "
> drop table test;
> create table test(id int);
> insert into test values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
> insert into test values (11), (12), (13), (14), (15), (16), (17), (18),
> (19), (20);
> insert into test values (21), (22), (23), (24), (25), (26), (27), (28),
> (29), (30);
> "
> {code}
> *Reproduce the issue*
> No rows are returned with `limit 10 offset 10`.
> {code:sql}
> $ beeline -e "select * from test limit 10, 10" --hiveconf
> hive.fetch.task.conversion=none --hiveconf tez.grouping.min-size=1 --hiveconf
> tez.grouping.max-size=1
> ...
> +----------+
> | test.id |
> +----------+
> +----------+
> No rows selected (13.595 seconds)
> {code}
> *Expected behavior*
> It should return any 10 rows like this.
> {code:sql}
> $ beeline -e "select * from test limit 10, 10"
> ...
> +----------+
> | test.id |
> +----------+
> | 11 |
> | 12 |
> | 13 |
> | 14 |
> | 15 |
> | 16 |
> | 17 |
> | 18 |
> | 19 |
> | 20 |
> +----------+
> 10 rows selected (0.175 seconds)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)