Re: [PR] HIVE-28935: Iceberg: fix partition filtering condition in compaction query [hive]

via GitHub Fri, 09 May 2025 11:19:04 -0700


difin commented on code in PR #5792:
URL: https://github.com/apache/hive/pull/5792#discussion_r2082259745



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergQueryCompactor.java:
##########
@@ -96,16 +106,46 @@ public boolean run(CompactorContext context) throws 
IOException, HiveException,
         throw new HiveException(ErrorMsg.COMPACTION_NO_PARTITION);
       }
     } else {
-      long partitionHash = IcebergTableUtil.getPartitionHash(icebergTable, 
partSpec);
+      Pair<Integer, StructProjection> partSpecPair =
+          IcebergTableUtil.getPartitionSpecIdAndStruct(icebergTable, partSpec);

Review Comment:
   > why can't we do this directly on FILES table? 
   Because we need to know the values that need to be used in the 
`named_struct`, but IcebergQueryCompactor gets partition name in the human 
readable format and partition values needs to be retrieved and converted.
   
   An example from existing Iceberg compaction q-test: we compact a table with 
partition spec `spec(truncate(3, event_src), month(event_time))`. 
   IcebergQueryCompactor gets `ci.partName` = 
`event_src_trunc=BBB/event_time_month=2024-08`.
   The query on the `files` table needs to be 
   `select FILE_PATH from default.ice_orc.files where `partition` = 
named_struct("event_src_trunc","AAA",event_time_month,655);`
   
   `2024-08` needs to be converted to `655`.
   
   I am listing partitions table to find the spec id and partition struct for 
the given `ci.partName` and applying conversions to get the values that can be 
used in the condition on `partition` field from `files` metatable.
   



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergQueryCompactor.java:
##########
@@ -96,16 +106,46 @@ public boolean run(CompactorContext context) throws 
IOException, HiveException,
         throw new HiveException(ErrorMsg.COMPACTION_NO_PARTITION);
       }
     } else {
-      long partitionHash = IcebergTableUtil.getPartitionHash(icebergTable, 
partSpec);
+      Pair<Integer, StructProjection> partSpecPair =
+          IcebergTableUtil.getPartitionSpecIdAndStruct(icebergTable, partSpec);

Review Comment:
   > why can't we do this directly on FILES table? 
   
   Because we need to know the values that need to be used in the 
`named_struct`, but IcebergQueryCompactor gets partition name in the human 
readable format and partition values needs to be retrieved and converted.
   
   An example from existing Iceberg compaction q-test: we compact a table with 
partition spec `spec(truncate(3, event_src), month(event_time))`. 
   IcebergQueryCompactor gets `ci.partName` = 
`event_src_trunc=BBB/event_time_month=2024-08`.
   The query on the `files` table needs to be 
   `select FILE_PATH from default.ice_orc.files where `partition` = 
named_struct("event_src_trunc","AAA",event_time_month,655);`
   
   `2024-08` needs to be converted to `655`.
   
   I am listing partitions table to find the spec id and partition struct for 
the given `ci.partName` and applying conversions to get the values that can be 
used in the condition on `partition` field from `files` metatable.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28935: Iceberg: fix partition filtering condition in compaction query [hive]

Reply via email to