Re: [PR] HIVE-29637: Iceberg: Incorrect Results for NULL Predicate on Partitio… [hive]

via GitHub Mon, 01 Jun 2026 10:35:49 -0700


zabetak commented on code in PR #6515:
URL: https://github.com/apache/hive/pull/6515#discussion_r3335831587



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -2316,8 +2316,8 @@ public List<Partition> 
getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.Ta
             PartitionData partitionData = 
IcebergTableUtil.toPartitionData(task.partition(), spec.partitionType());
             String partName = spec.partitionToPath(partitionData);
 
-            Map<String, String> partSpecMap = Maps.newLinkedHashMap();
-            Warehouse.makeSpecFromName(partSpecMap, new Path(partName), null);

Review Comment:
   There are more calls to `Warehouse#makeSpecFromName` methods in 
`IcebergTableUtil#convertNameToMetastorePartition`, `IcebergQueryCompactor`, 
and potentially other places as well. Do we need to update them as well?



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -452,6 +452,25 @@ public static void performMetadataDelete(Table 
icebergTable, String branchName,
     deleteFiles.deleteFromRowFilter(exp).commit();
   }
 
+  /**
+   * Parses an Iceberg partition path into a Hive-compatible spec map.

Review Comment:
   The method somewhat "overrides" the default behavior of 
`Warehouse.makeSpecFromName` but it's unclear why this specificity is required 
for Iceberg especially since we want to create a "Hive-compatible" spec map.



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -452,6 +452,25 @@ public static void performMetadataDelete(Table 
icebergTable, String branchName,
     deleteFiles.deleteFromRowFilter(exp).commit();
   }
 
+  /**
+   * Parses an Iceberg partition path into a Hive-compatible spec map.
+   * Unlike {@link Warehouse#makeSpecFromName}, this correctly represents null 
partition values
+   * as {@code null} instead of the literal string "null".
+   */
+  public static Map<String, String> makeSpecFromName(String partName,
+      PartitionSpec spec, PartitionData data) {
+    Map<String, String> partSpecMap = Maps.newLinkedHashMap();
+    Warehouse.makeSpecFromName(partSpecMap, new Path(partName), null);
+
+    List<PartitionField> fields = spec.fields();
+    for (int i = 0; i < fields.size(); i++) {
+      if (data.get(i, Object.class) == null) {
+        partSpecMap.put(fields.get(i).name(), null);

Review Comment:
   If we can alter the `partSpecMap` based on the `PartitionSpec` then why do 
we even need to pass through the `Warehouse.makeSpecForName` method? Can we 
create the spec map exclusively using the Iceberg APIs?



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -452,6 +452,25 @@ public static void performMetadataDelete(Table 
icebergTable, String branchName,
     deleteFiles.deleteFromRowFilter(exp).commit();
   }
 
+  /**
+   * Parses an Iceberg partition path into a Hive-compatible spec map.
+   * Unlike {@link Warehouse#makeSpecFromName}, this correctly represents null 
partition values
+   * as {@code null} instead of the literal string "null".

Review Comment:
   Based on the comment I get the impression that for other table formats where 
we use the string "null" things work fine so I am wondering why this is only a 
problem for Iceberg?



##########
iceberg/iceberg-handler/src/test/queries/positive/iceberg_isnull_partition_pruning.q:
##########
@@ -0,0 +1,38 @@
+CREATE TABLE sample01 (
+  index INT,
+  date_col DATE,
+  timestamp_col TIMESTAMP,
+  str_col VARCHAR(24),
+  string_col STRING,
+  double_col DOUBLE,
+  float_col FLOAT,
+  decimal_col DECIMAL(9,3),
+  tinyint_col TINYINT,
+  smallint_col SMALLINT,
+  int_col INT,
+  bigint_col BIGINT,
+  boolean_col BOOLEAN
+);
+
+INSERT INTO sample01 VALUES
+(1003,"1969-10-27","1993-05-17 07:39:58.375409",NULL,"sloppy bronze 
hare",-181.01933598375618,-181.019336,-999999.999,-128,-32768,-2147483648,-9223372036854775808,false);
+
+CREATE EXTERNAL TABLE ice01 (
+  index INT,
+  date_col DATE,
+  timestamp_col TIMESTAMP,
+  string_col STRING,
+  double_col DOUBLE,
+  float_col FLOAT,
+  decimal_col DECIMAL(9,3),
+  smallint_col int,
+  int_col INT,
+  bigint_col BIGINT,
+  boolean_col BOOLEAN
+) PARTITIONED BY (str_col String, tinyint_col int)
+STORED BY iceberg;
+
+INSERT INTO ice01 PARTITION (str_col, tinyint_col)
+  SELECT index, date_col, timestamp_col, string_col, double_col, float_col, 
decimal_col, smallint_col, int_col, bigint_col, boolean_col, str_col, 
tinyint_col FROM sample01;
+
+SELECT * FROM ice01 WHERE str_col is NULL;

Review Comment:
   Should I add tests for partition with nulls on other type of columns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29637: Iceberg: Incorrect Results for NULL Predicate on Partitio… [hive]

Reply via email to