zabetak commented on code in PR #6515:
URL: https://github.com/apache/hive/pull/6515#discussion_r3335831587
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -2316,8 +2316,8 @@ public List<Partition>
getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.Ta
PartitionData partitionData =
IcebergTableUtil.toPartitionData(task.partition(), spec.partitionType());
String partName = spec.partitionToPath(partitionData);
- Map<String, String> partSpecMap = Maps.newLinkedHashMap();
- Warehouse.makeSpecFromName(partSpecMap, new Path(partName), null);
Review Comment:
There are more calls to `Warehouse#makeSpecFromName` methods in
`IcebergTableUtil#convertNameToMetastorePartition`, `IcebergQueryCompactor`,
and potentially other places as well. Do we need to update them as well?
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -452,6 +452,25 @@ public static void performMetadataDelete(Table
icebergTable, String branchName,
deleteFiles.deleteFromRowFilter(exp).commit();
}
+ /**
+ * Parses an Iceberg partition path into a Hive-compatible spec map.
Review Comment:
The method somewhat "overrides" the default behavior of
`Warehouse.makeSpecFromName` but it's unclear why this specificity is required
for Iceberg especially since we want to create a "Hive-compatible" spec map.
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -452,6 +452,25 @@ public static void performMetadataDelete(Table
icebergTable, String branchName,
deleteFiles.deleteFromRowFilter(exp).commit();
}
+ /**
+ * Parses an Iceberg partition path into a Hive-compatible spec map.
+ * Unlike {@link Warehouse#makeSpecFromName}, this correctly represents null
partition values
+ * as {@code null} instead of the literal string "null".
+ */
+ public static Map<String, String> makeSpecFromName(String partName,
+ PartitionSpec spec, PartitionData data) {
+ Map<String, String> partSpecMap = Maps.newLinkedHashMap();
+ Warehouse.makeSpecFromName(partSpecMap, new Path(partName), null);
+
+ List<PartitionField> fields = spec.fields();
+ for (int i = 0; i < fields.size(); i++) {
+ if (data.get(i, Object.class) == null) {
+ partSpecMap.put(fields.get(i).name(), null);
Review Comment:
If we can alter the `partSpecMap` based on the `PartitionSpec` then why do
we even need to pass through the `Warehouse.makeSpecForName` method? Can we
create the spec map exclusively using the Iceberg APIs?
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java:
##########
@@ -452,6 +452,25 @@ public static void performMetadataDelete(Table
icebergTable, String branchName,
deleteFiles.deleteFromRowFilter(exp).commit();
}
+ /**
+ * Parses an Iceberg partition path into a Hive-compatible spec map.
+ * Unlike {@link Warehouse#makeSpecFromName}, this correctly represents null
partition values
+ * as {@code null} instead of the literal string "null".
Review Comment:
Based on the comment I get the impression that for other table formats where
we use the string "null" things work fine so I am wondering why this is only a
problem for Iceberg?
##########
iceberg/iceberg-handler/src/test/queries/positive/iceberg_isnull_partition_pruning.q:
##########
@@ -0,0 +1,38 @@
+CREATE TABLE sample01 (
+ index INT,
+ date_col DATE,
+ timestamp_col TIMESTAMP,
+ str_col VARCHAR(24),
+ string_col STRING,
+ double_col DOUBLE,
+ float_col FLOAT,
+ decimal_col DECIMAL(9,3),
+ tinyint_col TINYINT,
+ smallint_col SMALLINT,
+ int_col INT,
+ bigint_col BIGINT,
+ boolean_col BOOLEAN
+);
+
+INSERT INTO sample01 VALUES
+(1003,"1969-10-27","1993-05-17 07:39:58.375409",NULL,"sloppy bronze
hare",-181.01933598375618,-181.019336,-999999.999,-128,-32768,-2147483648,-9223372036854775808,false);
+
+CREATE EXTERNAL TABLE ice01 (
+ index INT,
+ date_col DATE,
+ timestamp_col TIMESTAMP,
+ string_col STRING,
+ double_col DOUBLE,
+ float_col FLOAT,
+ decimal_col DECIMAL(9,3),
+ smallint_col int,
+ int_col INT,
+ bigint_col BIGINT,
+ boolean_col BOOLEAN
+) PARTITIONED BY (str_col String, tinyint_col int)
+STORED BY iceberg;
+
+INSERT INTO ice01 PARTITION (str_col, tinyint_col)
+ SELECT index, date_col, timestamp_col, string_col, double_col, float_col,
decimal_col, smallint_col, int_col, bigint_col, boolean_col, str_col,
tinyint_col FROM sample01;
+
+SELECT * FROM ice01 WHERE str_col is NULL;
Review Comment:
Should I add tests for partition with nulls on other type of columns?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]