[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3862: Spark: Supports partition management in V2 Catalog

GitBox Tue, 15 Feb 2022 10:42:48 -0800


RussellSpitzer commented on a change in pull request #3862:
URL: https://github.com/apache/iceberg/pull/3862#discussion_r806300891




##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java
##########
@@ -272,6 +283,65 @@ public void deleteWhere(Filter[] filters) {
     }
   }
 
+  @Override
+  public StructType partitionSchema() {
+    return (StructType) 
SparkSchemaUtil.convert(Partitioning.partitionType(table()));
+  }
+
+  @Override
+  public void createPartition(InternalRow ident, Map<String, String> 
properties) throws UnsupportedOperationException {
+    throw new UnsupportedOperationException("Cannot explicitly create 
partitions in Iceberg tables");
+  }
+
+  @Override
+  public boolean dropPartition(InternalRow ident) {
+    throw new UnsupportedOperationException("Cannot explicitly drop partitions 
in Iceberg tables");
+  }
+
+  @Override
+  public void replacePartitionMetadata(InternalRow ident, Map<String, String> 
properties)
+          throws UnsupportedOperationException {
+    throw new UnsupportedOperationException("Iceberg partitions do not support 
metadata");
+  }
+
+  @Override
+  public Map<String, String> loadPartitionMetadata(InternalRow ident) throws 
UnsupportedOperationException {
+    throw new UnsupportedOperationException("Iceberg partitions do not support 
metadata");
+  }
+
+  @Override
+  public InternalRow[] listPartitionIdentifiers(String[] names, InternalRow 
ident) {
+    // support show partitions
+    List<InternalRow> rows = Lists.newArrayList();
+    Dataset<Row> df = SparkTableUtil.loadMetadataTable(sparkSession(), 
icebergTable, MetadataTableType.PARTITIONS);
+    if (names.length > 0) {
+      StructType schema = partitionSchema();
+      df.collectAsList().forEach(row -> {
+        GenericRowWithSchema genericRow = (GenericRowWithSchema) row.apply(0);
+        boolean exits = true;
+        int index = 0;
+        while (index < names.length) {
+          DataType dataType = schema.apply(names[index]).dataType();

Review comment:
       It looks like we are trying to align the metadata table schema with the 
current table schema. I think we should still just be displaying metadata table 
partition values as is but if we choose to go this route I think we have an 
issue here still. 
   
   Consider a table
   ```
   Add Partition Column Identity (a)
   Remove Partition Column identity (a)
   Drop Column a
   Add Column a
   Add partition Column Identity (a)
   ```
   
   This should result in a row which has multiple "a"'s in the partition spec 
(at least I believe this is the current behavior). We should make sure we are 
correctly projecting columns in those cases. I think it is also ok for this 
just to be a light wrapper around the Metadata Table for Partitions and just 
list the partitions in the extended schema it provides.

##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java
##########
@@ -272,6 +283,65 @@ public void deleteWhere(Filter[] filters) {
     }
   }
 
+  @Override
+  public StructType partitionSchema() {
+    return (StructType) 
SparkSchemaUtil.convert(Partitioning.partitionType(table()));
+  }
+
+  @Override
+  public void createPartition(InternalRow ident, Map<String, String> 
properties) throws UnsupportedOperationException {
+    throw new UnsupportedOperationException("Cannot explicitly create 
partitions in Iceberg tables");
+  }
+
+  @Override
+  public boolean dropPartition(InternalRow ident) {
+    throw new UnsupportedOperationException("Cannot explicitly drop partitions 
in Iceberg tables");
+  }
+
+  @Override
+  public void replacePartitionMetadata(InternalRow ident, Map<String, String> 
properties)
+          throws UnsupportedOperationException {
+    throw new UnsupportedOperationException("Iceberg partitions do not support 
metadata");
+  }
+
+  @Override
+  public Map<String, String> loadPartitionMetadata(InternalRow ident) throws 
UnsupportedOperationException {
+    throw new UnsupportedOperationException("Iceberg partitions do not support 
metadata");
+  }
+
+  @Override
+  public InternalRow[] listPartitionIdentifiers(String[] names, InternalRow 
ident) {
+    // support show partitions
+    List<InternalRow> rows = Lists.newArrayList();
+    Dataset<Row> df = SparkTableUtil.loadMetadataTable(sparkSession(), 
icebergTable, MetadataTableType.PARTITIONS);
+    if (names.length > 0) {
+      StructType schema = partitionSchema();
+      df.collectAsList().forEach(row -> {
+        GenericRowWithSchema genericRow = (GenericRowWithSchema) row.apply(0);
+        boolean exits = true;
+        int index = 0;
+        while (index < names.length) {
+          DataType dataType = schema.apply(names[index]).dataType();

Review comment:
       It looks like we are trying to align the metadata table schema with the 
current table schema. I think we should still just be displaying metadata table 
partition values as is but if we choose to go this route I think we have an 
issue here still. 
   
   Consider a table
   ```
   Add Partition Column Identity (a)
   Remove Partition Column identity (a)
   Drop Column a
   Add Column a
   Add partition Column Identity (a)
   ```
   
   This should result in a row which has multiple "a"'s in the partition spec 
(at least I believe this is the current behavior). We should make sure we are 
correctly projecting columns in those cases. I think it is also ok for this 
just to be a light wrapper around the Metadata Table for Partitions and just 
list the partitions in the extended schema it provides.
   
   I guess this may be a little odd for unpartitioned tables since they may 
still show that partitions do exist but this is probably more accurate ...
   @jackye1995 + @szehon-ho Any thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3862: Spark: Supports partition management in V2 Catalog

Reply via email to