[jira] [Work logged] (HIVE-25410) CommonMergeJoin fails for ARRAY join keys with varying size

ASF GitHub Bot (Jira) Tue, 10 Aug 2021 04:05:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25410?focusedWorklogId=636396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636396
 ]


ASF GitHub Bot logged work on HIVE-25410:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Aug/21 11:04
            Start Date: 10/Aug/21 11:04
    Worklog Time Spent: 10m 
      Work Description: okumin commented on a change in pull request #2551:
URL: https://github.com/apache/hive/pull/2551#discussion_r685915956



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveStructComparator.java
##########
@@ -45,16 +48,14 @@ public int compare(Object key1, Object key2) {
         if (a1.size() == 0) {
             return 0;
         }
-        if (comparator == null) {
-            comparator = new WritableComparator[a1.size()];
-            // For struct all elements may not be of same type, so create 
comparator for each entry.
-            for (int i = 0; i < a1.size(); i++) {
-                comparator[i] = WritableComparatorFactory.get(a1.get(i), 
nullSafe, nullOrdering);
-            }
+        // For array, the length may not be fixed, so extend comparators on 
demand
+        for (int i = comparators.size(); i < a1.size(); i++) {
+            // For struct, all elements may not be of same type, so create 
comparator for each entry.
+            comparators.add(i, WritableComparatorFactory.get(a1.get(i), 
nullSafe, nullOrdering));
         }
         result = 0;
         for (int i = 0; i < a1.size(); i++) {
-            result = comparator[i].compare(a1.get(i), a2.get(i));
+            result = comparators.get(i).compare(a1.get(i), a2.get(i));

Review comment:
       @zabetak Maybe, I have the same feeling. Basically, STRUCT and ARRAY are 
different data structures and we can have different approach. That would also 
make implementation straightforward.
   My idea is let WritableComparatorFactory identify more precise types so that 
it can distinct STRUCT, ARRAY, and so on. It will require some effort since 
WritableComparatorFactory has to accept ObjectInspector or type information. 
But it sounds more robust than inferring data types from `Object`.
   Anyway, I agree to create a follow-up ticket and I will do that if you have 
nothing more to discuss here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 636396)
    Time Spent: 1h 20m  (was: 1h 10m)

> CommonMergeJoin fails for ARRAY join keys with varying size
> -----------------------------------------------------------
>
>                 Key: HIVE-25410
>                 URL: https://issues.apache.org/jira/browse/HIVE-25410
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: okumin
>            Assignee: okumin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Thanks to HIVE-24883, CommonMergeJoinOperator can handle ARRAY or STRUCT 
> types as a JOIN key.
> There are corner cases where CommonMergeJoinOperator fails with 
> `ArrayIndexOutOfBoundsException`.
>  
> This is a simple case.
> {code:java}
> SET hive.auto.convert.join=false;
> CREATE TABLE table_list_types (id int, key array<int>);
> INSERT INTO table_list_types VALUES (1, array(1, 2)), (2, array(1, 2)), (3, 
> array(1, 2, 3)), (4, array(1, 2, 3));
> SELECT * FROM table_list_types t1 INNER JOIN table_list_types t2 ON t1.key = 
> t2.key; {code}
> With 69c97c26ac68a245f4d327cc2f7b3a2333f8fa84, the following error happened.
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>       at 
> org.apache.hadoop.hive.ql.exec.HiveStructComparator.compare(HiveStructComparator.java:57)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.compareKey(CommonMergeJoinOperator.java:629)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.compareKeys(CommonMergeJoinOperator.java:597)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:566)
>       at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:249)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
>       ... 26 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25410) CommonMergeJoin fails for ARRAY join keys with varying size

Reply via email to