Re: [PR] HIVE-28029: Make unit tests based on TxnCommandsBaseForTests/DbTxnManagerEndToEndTestBase run on Tez [hive]

via GitHub Sat, 07 Dec 2024 06:21:27 -0800


kasakrisz commented on code in PR #5559:
URL: https://github.com/apache/hive/pull/5559#discussion_r1874494361



##########
ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java:
##########
@@ -429,33 +424,33 @@ now that T is Acid, data for each writerId is treated 
like a logical bucket (tho
      logical bucket (tranche)
      */
     String expected2[][] = {
-        {"{\"writeid\":0,\"bucketid\":537001984,\"rowid\":0}\t1\t2",  
"warehouse/t/000002_0"},
-        {"{\"writeid\":0,\"bucketid\":537001984,\"rowid\":1}\t2\t4",  
"warehouse/t/000002_0"},
-        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":0}\t5\t6",  
"warehouse/t/000000_0"},
-        {"{\"writeid\":0,\"bucketid\":536936448,\"rowid\":0}\t6\t8",  
"warehouse/t/000001_0"},
-        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":1}\t9\t10", 
"warehouse/t/000000_0"},
+        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":1}\t1\t2",  
"warehouse/t/HIVE_UNION_SUBDIR_1/000000_0"},
+        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":2}\t2\t4",  
"warehouse/t/HIVE_UNION_SUBDIR_1/000000_0"},
+        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":6}\t5\t6",  
"warehouse/t/HIVE_UNION_SUBDIR_2/000000_0"},
+        {"{\"writeid\":0,\"bucketid\":536936448,\"rowid\":1}\t6\t8",  
"warehouse/t/HIVE_UNION_SUBDIR_2/000001_0"},
+        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":7}\t9\t10", 
"warehouse/t/HIVE_UNION_SUBDIR_3/000000_0"},
         {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":3}\t10\t20", 
"warehouse/t/HIVE_UNION_SUBDIR_15/000000_0"},
-        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":2}\t12\t12", 
"warehouse/t/000000_0_copy_1"},
+        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":0}\t12\t12", 
"warehouse/t/000000_0"},
         {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":4}\t20\t40", 
"warehouse/t/HIVE_UNION_SUBDIR_15/000000_0"},
         {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":5}\t50\t60", 
"warehouse/t/HIVE_UNION_SUBDIR_16/000000_0"},
-        {"{\"writeid\":0,\"bucketid\":536936448,\"rowid\":1}\t60\t80", 
"warehouse/t/HIVE_UNION_SUBDIR_16/000001_0"},
+        {"{\"writeid\":0,\"bucketid\":536936448,\"rowid\":0}\t60\t80", 
"warehouse/t/HIVE_UNION_SUBDIR_16/000001_0"},
     };
     checkExpected(rs, expected2,"after converting to acid (no compaction)");
     Assert.assertEquals(0, 
BucketCodec.determineVersion(536870912).decodeWriterId(536870912));
     Assert.assertEquals(2, 
BucketCodec.determineVersion(537001984).decodeWriterId(537001984));
     Assert.assertEquals(1, 
BucketCodec.determineVersion(536936448).decodeWriterId(536936448));
 
-    assertVectorized(shouldVectorize(), "update T set b = 88 where b = 80");
+    assertVectorized("update T set b = 88 where b = 80");
     runStatementOnDriver("update T set b = 88 where b = 80");
-    assertVectorized(shouldVectorize(), "delete from T where b = 8");
+    assertVectorized("delete from T where b = 8");
     runStatementOnDriver("delete from T where b = 8");
     String expected3[][] = {
-        {"{\"writeid\":0,\"bucketid\":537001984,\"rowid\":0}\t1\t2",  
"warehouse/t/000002_0"},
-        {"{\"writeid\":0,\"bucketid\":537001984,\"rowid\":1}\t2\t4",  
"warehouse/t/000002_0"},
-        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":0}\t5\t6",  
"warehouse/t/000000_0"},
-        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":1}\t9\t10", 
"warehouse/t/000000_0"},
+        {"{\"writeid\":0,\"bucketid\":536870912,\"rowid\":1}\t1\t2",  
"warehouse/t/HIVE_UNION_SUBDIR_1/000000_0"},

Review Comment:
   Oh, ok. In case of bucket files which are not in acid format the row_id is 
generated at read. The bucket id is coming from the file name.
   
https://github.com/apache/hive/blob/c27d31722c2a7426f3236d3d892dbe1e206e840d/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L547C5-L547C44
   In case of acid writes it comes from the taskId
   
https://github.com/apache/hive/blob/c27d31722c2a7426f3236d3d892dbe1e206e840d/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L969C13-L969C26
   
   In case of Tez the taskIds are mapped differently to the files
   
   I changed the update statement to update more than one record to achieve 
having more than one bucket in the new delta



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-28029: Make unit tests based on TxnCommandsBaseForTests/DbTxnManagerEndToEndTestBase run on Tez [hive]

Reply via email to