[jira] [Commented] (HIVE-27342) Duplicate row retured using Order by, Limit and Offset

okumin (Jira) Sun, 18 Jun 2023 00:08:05 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-27342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733864#comment-17733864
 ]


okumin commented on HIVE-27342:
-------------------------------

[~jimmydeng] I didn't reproduce the issue on 4.0.0-alpha-2.
{code:java}
0: jdbc:hive2://hive-hiveserver2:10000/defaul> create table t1(f1 int);
...
No rows affected (0.057 seconds)
0: jdbc:hive2://hive-hiveserver2:10000/defaul> insert into t1 
values(111),(222),(333),(444),(555),(666),(777),(888),(999);
...
9 rows affected (11.162 seconds)
...
0: jdbc:hive2://hive-hiveserver2:10000/defaul> select * from t1 order by f1 
limit 0,3;
...
+--------+
| t1.f1  |
+--------+
| 111    |
| 222    |
| 333    |
+--------+
...
0: jdbc:hive2://hive-hiveserver2:10000/defaul> select * from t1 order by f1 
limit 3,3;
...
+--------+
| t1.f1  |
+--------+
| 444    |
| 555    |
| 666    |
+--------+{code}
I remember we applied [several patches to 
VectorLimitOperator|https://github.com/apache/hive/commits/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java]
 to fix issues related to OFFSET.

Actually, we backported 
[HIVE-22120|https://issues.apache.org/jira/browse/HIVE-22120], 
[HIVE-22164|https://issues.apache.org/jira/browse/HIVE-22164], and 
[HIVE-23265|https://issues.apache.org/jira/browse/HIVE-23265]. I guess 
HIVE-22164 would fix your problem though my memory could be wrong.

> Duplicate row retured using Order by, Limit and Offset
> ------------------------------------------------------
>
>                 Key: HIVE-27342
>                 URL: https://issues.apache.org/jira/browse/HIVE-27342
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.1
>            Reporter: jimmydeng
>            Priority: Major
>
> Create an example table: 
> {code:java}
> create table t1(f1 int);
> insert into t1 values(111),(222),(333),(444),(555),(666),(777),(888),(999); 
> {code}
>  
> Query using order by, limit, offset. Page 1 is correct: 
> {code:java}
> select * from t1 order by f1 limit 0,3;
> +---------+
> | t1.f1   |
> +---------+
> | 111     |
> | 222     |
> | 333     |
> +---------+{code}
>  
> But there is an duplicate row `333` on page 2: 
> {code:java}
> select * from t1 order by f1 limit 3,3;
> +---------+
> | t1.f1   |
> +---------+
> | 333     |
> | 444     |
> | 555     |
> +---------+  
> {code}
> set hive.vectorized.execution.reduce.enabled=false does not cause the problem.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27342) Duplicate row retured using Order by, Limit and Offset

Reply via email to