Re: [PR] HIVE-29375: FULL OUTER JOIN is failing with Unexpected hash table key type DATE [hive]

via GitHub Fri, 02 Jan 2026 11:09:17 -0800


zabetak commented on code in PR #6239:
URL: https://github.com/apache/hive/pull/6239#discussion_r2658229299



##########
ql/src/test/queries/clientpositive/vector_full_outer_join_date.q:
##########
@@ -0,0 +1,29 @@
+set hive.optimize.dynamic.partition.hashjoin=true;
+set hive.auto.convert.join=true;
+
+-- Test Date column
+create table tbl1 (id int, event_date date);
+create table tbl2 (id int, event_date date);
+
+insert into tbl1 values (1, '2023-01-01'), (2, '2023-01-02'), (3, 
'2023-01-03');
+insert into tbl2 values (2, '2023-01-02'), (3, '2023-01-04'), (4, 
'2023-01-05');
+
+select tbl1.id, tbl1.event_date from tbl1 full outer join tbl2 on 
tbl1.event_date = tbl2.event_date order by tbl1.id;

Review Comment:
   Can we also print the plan using `explain vectorization detail` in order to 
ensure that we are indeed using the expected vectorized operator.



##########
ql/src/test/queries/clientpositive/vector_full_outer_join_date.q:
##########
@@ -0,0 +1,29 @@
+set hive.optimize.dynamic.partition.hashjoin=true;
+set hive.auto.convert.join=true;
+
+-- Test Date column
+create table tbl1 (id int, event_date date);
+create table tbl2 (id int, event_date date);
+
+insert into tbl1 values (1, '2023-01-01'), (2, '2023-01-02'), (3, 
'2023-01-03');
+insert into tbl2 values (2, '2023-01-02'), (3, '2023-01-04'), (4, 
'2023-01-05');
+
+select tbl1.id, tbl1.event_date from tbl1 full outer join tbl2 on 
tbl1.event_date = tbl2.event_date order by tbl1.id;
+
+-- Test timestamp column
+create table tbl3 (id int, event_date timestamp);
+create table tbl4 (id int, event_date timestamp);
+
+insert into tbl3 values (1, '2025-12-17 10:20:30'), (2, '2025-12-17 11:20:30');
+insert into tbl4 values (2, '2025-12-17 11:20:30'), (3, '2025-12-17 09:20:30');
+
+select tbl3.id, tbl3.event_date from tbl3 full outer join tbl4 on 
tbl3.event_date = tbl4.event_date order by tbl3.id;
+
+-- Test Double column
+create table tbl5 (id int, val double);
+create table tbl6 (id int, val double);
+
+insert into tbl5 values (1, 5.6D), (2, 3.2D);
+insert into tbl6 values (2, 3.2D), (3, 7.2D);
+
+select tbl5.id, tbl5.val from tbl5 full outer join tbl6 on tbl5.val = tbl6.val 
order by tbl5.id;

Review Comment:
   Why are we adding tests for TIMESTAMP and DOUBLE types? They don't seem to 
be in the same code path with DATE. Are we fixing anything with respect to 
those data types?



##########
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMap.java:
##########


Review Comment:
   The code here seems very similar to 
`org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastLongHashUtil#deserializeLongKey`.
 Should we use this method instead?



##########
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/TestVectorMapJoinFastRowHashMap.java:
##########
@@ -495,6 +495,47 @@ public void testBigIntRowsExact() throws Exception {
         /* doClipping */ false, /* useExactBytes */ true);
   }
 
+  @Test
+  public void testDateRowsExact() throws Exception {
+    random = new Random(44332);
+
+    // Use a large capacity that doesn't require expansion, yet.
+    VectorMapJoinFastLongHashMapContainer map =
+        new VectorMapJoinFastLongHashMapContainer(
+            false,
+            false,
+            HashTableKeyType.DATE,
+            LARGE_CAPACITY,
+            LOAD_FACTOR,
+            LARGE_WB_SIZE,
+            -1,
+            tableDesc,
+            4);
+
+    VerifyFastRowHashMap verifyTable = new VerifyFastRowHashMap();
+    VectorRandomRowSource valueSource = new VectorRandomRowSource();
+
+    valueSource.init(
+        random,
+        VectorRandomRowSource.SupportedTypes.ALL,
+        4,
+        /* allowNulls */ false, /* isUnicodeOk */
+        false);
+
+    int rowCount = 1000;
+    Object[][] rows = valueSource.randomRows(rowCount);
+
+    addAndVerifyRows(
+        valueSource,
+        rows,
+        map,
+        HashTableKeyType.DATE,
+        verifyTable,
+        new String[] {"date"},
+        /* doClipping */ false, /* useExactBytes */

Review Comment:
   nit: Drop redundant comments



##########
ql/src/test/queries/clientpositive/vector_full_outer_join_date.q:
##########
@@ -0,0 +1,29 @@
+set hive.optimize.dynamic.partition.hashjoin=true;
+set hive.auto.convert.join=true;
+
+-- Test Date column
+create table tbl1 (id int, event_date date);
+create table tbl2 (id int, event_date date);
+
+insert into tbl1 values (1, '2023-01-01'), (2, '2023-01-02'), (3, 
'2023-01-03');
+insert into tbl2 values (2, '2023-01-02'), (3, '2023-01-04'), (4, 
'2023-01-05');
+
+select tbl1.id, tbl1.event_date from tbl1 full outer join tbl2 on 
tbl1.event_date = tbl2.event_date order by tbl1.id;

Review Comment:
   Since we are performing a join it would be nice to SELECT also columns from 
tbl2 otherwise we can't tell if the result is correct.



##########
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/TestVectorMapJoinFastRowHashMap.java:
##########
@@ -495,6 +495,47 @@ public void testBigIntRowsExact() throws Exception {
         /* doClipping */ false, /* useExactBytes */ true);
   }
 
+  @Test
+  public void testDateRowsExact() throws Exception {
+    random = new Random(44332);
+
+    // Use a large capacity that doesn't require expansion, yet.
+    VectorMapJoinFastLongHashMapContainer map =
+        new VectorMapJoinFastLongHashMapContainer(
+            false,
+            false,
+            HashTableKeyType.DATE,
+            LARGE_CAPACITY,
+            LOAD_FACTOR,
+            LARGE_WB_SIZE,
+            -1,
+            tableDesc,
+            4);
+
+    VerifyFastRowHashMap verifyTable = new VerifyFastRowHashMap();
+    VectorRandomRowSource valueSource = new VectorRandomRowSource();
+
+    valueSource.init(
+        random,
+        VectorRandomRowSource.SupportedTypes.ALL,
+        4,
+        /* allowNulls */ false, /* isUnicodeOk */

Review Comment:
   nit: Drop redundant comments



##########
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/MapJoinTestConfig.java:
##########
@@ -394,6 +394,9 @@ public static VectorMapJoinDesc 
createVectorMapJoinDesc(MapJoinTestDescription t
       case LONG:
         hashTableKeyType = HashTableKeyType.LONG;
         break;
+      case DATE:
+        hashTableKeyType = HashTableKeyType.DATE;
+        break;

Review Comment:
   Do we have unit tests exploiting this config? Do we need to add something in 
`TestMapJoinOperator`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29375: FULL OUTER JOIN is failing with Unexpected hash table key type DATE [hive]

Reply via email to