[jira] [Created] (HIVE-27480) OFFSET without ORDER BY generates wrong results

okumin (Jira) Sun, 02 Jul 2023 07:08:03 -0700

okumin created HIVE-27480:
-----------------------------

             Summary: OFFSET without ORDER BY generates wrong results
                 Key: HIVE-27480
                 URL: https://issues.apache.org/jira/browse/HIVE-27480
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 4.0.0-alpha-2, 3.1.3, 2.4.0
            Reporter: okumin
            Assignee: okumin



Without ORDER BY, Hive doesn't add an additional single reducer, and it pushes 
OFFSET and LIMIT to the original vertex. It can generate incorrect results 
because OFFSET should be counted globally unlike LIMIT.

We would make the following changes.
- To fix the incorrect behavior
- To add a new `hive.strict.checks.*` to prevent this usage. Mostly, OFFSET 
without ORDER BY is meaningless

We can reproduce the issue in the following steps.

*Prepare test data*

The following SQLs generate a test table with multiple files.

{code:sql}
$ beeline -e "
  drop table test;
  create table test(id int);
  insert into test values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
  insert into test values (11), (12), (13), (14), (15), (16), (17), (18), (19), 
(20);
  insert into test values (21), (22), (23), (24), (25), (26), (27), (28), (29), 
(30);
"
{code}

*Reproduce the issue*

No rows are returned with `limit 10 offset 10`.

{code:sql}
$ beeline -e "select * from test limit 10, 10" --hiveconf 
hive.fetch.task.conversion=none --hiveconf tez.grouping.min-size=1 --hiveconf 
tez.grouping.max-size=1
...
+----------+
| test.id  |
+----------+
+----------+
No rows selected (13.595 seconds)
{code}

*Expected behavior*

It should return any 10 rows like this.

{code:sql}
$ beeline -e "select * from test limit 10, 10"
...
+----------+
| test.id  |
+----------+
| 11       |
| 12       |
| 13       |
| 14       |
| 15       |
| 16       |
| 17       |
| 18       |
| 19       |
| 20       |
+----------+
10 rows selected (0.175 seconds)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-27480) OFFSET without ORDER BY generates wrong results

Reply via email to