[GitHub] [iceberg] szehon-ho commented on a diff in pull request #4812: Spark 3.2: Support reading position deletes

GitBox Tue, 12 Jul 2022 16:11:00 -0700


szehon-ho commented on code in PR #4812:
URL: https://github.com/apache/iceberg/pull/4812#discussion_r919500148



##########
core/src/main/java/org/apache/iceberg/io/DeleteSchemaUtil.java:
##########
@@ -20,13 +20,38 @@
 package org.apache.iceberg.io;
 
 import org.apache.iceberg.MetadataColumns;
+import org.apache.iceberg.Partitioning;
 import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.types.TypeUtil;
 import org.apache.iceberg.types.Types;
 
 public class DeleteSchemaUtil {
   private DeleteSchemaUtil() {
   }
 
+  public static Schema metadataTableSchema(Table table) {
+    return metadataTableSchema(table, Partitioning.partitionType(table));
+  }
+
+  public static Schema metadataTableSchema(Table table, Types.StructType 
partitionType) {
+    Schema result = new Schema(
+        MetadataColumns.DELETE_FILE_PATH,

Review Comment:
   Actually looking into this, if I don't re-use the metadata column I have to 
change a lot of codes, like:
   
   - BaseDataReader.constantsMap will need new method to populate partition 
based on new partition id, which I'll have to pass in based on what field is 
selected.  (Currently since both id's are the same it works)
   - All the code in file-type Readers to remove the metadata columns from 
selected columns and do error checks.  Currently this uses 
MetadataColumns.metadataFieldIds(), which will need to be now something like 
MetadataColumns.metadataAndConstantFieldIds()  to take into account new column. 
 Even though this is not a metadata column per se, it still is a constant 
column that should not be selected from the content file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #4812: Spark 3.2: Support reading position deletes

Reply via email to