[GitHub] [iceberg] kbendick commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

GitBox Thu, 06 Jan 2022 13:37:48 -0800


kbendick commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r779880513




##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) 
{
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, 
boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition 
and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || 
schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in 
schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition 
name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", 
name);
+      if (checkConflict) {

Review comment:
       Agreed it does look cleaner in the builder!
   
   Question for my own full understanding: Is it correct to say that we can 
ignore `checkConflict` when we know there is no conflict? Like in the 
transformSpec, we can set `checkConflict` to `false` because we're already 
adding `partitionPrefix`?

##########
File path: core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
##########
@@ -459,6 +460,65 @@ public void testDataFilesTableSelection() throws 
IOException {
     Assert.assertEquals(expected, scan.schema().asStruct());
   }
 
+  @Test
+  public void testPartitionColumnNamedPartition() throws Exception {
+    TestTables.clearTables();
+    this.tableDir = temp.newFolder();
+    tableDir.delete();
+
+    Schema schema = new Schema(
+        required(1, "id", Types.IntegerType.get()),
+        // create a column named "partition", to recreate the problem in
+        // https://github.com/apache/iceberg/issues/3709
+        required(2, "partition", Types.IntegerType.get())
+    );
+    this.metadataDir = new File(tableDir, "metadata");
+    PartitionSpec spec = PartitionSpec.builderFor(schema)
+        .identity("partition")
+        .build();
+
+    DataFile par0 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(0))
+        .withRecordCount(1)
+        .build();
+    DataFile par1 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(1))
+        .withRecordCount(1)
+        .build();
+    DataFile par2 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(2))
+        .withRecordCount(1)
+        .build();
+
+    this.table = create(schema, spec);
+    table.newFastAppend()
+        .appendFile(par0)
+        .commit();
+    table.newFastAppend()
+        .appendFile(par1)
+        .commit();
+    table.newFastAppend()
+        .appendFile(par2)
+        .commit();
+
+    Table partitionsTable = new PartitionsTable(table.ops(), table);
+
+    Expression andEquals = Expressions.and(
+        Expressions.equal("partition.partition", 0),
+        Expressions.greaterThan("record_count", 0));
+    TableScan scanAndEq = partitionsTable.newScan().filter(andEquals);
+    CloseableIterable<FileScanTask> tasksAndEq = 
PartitionsTable.planFiles((StaticTableScan) scanAndEq);
+    Assert.assertEquals(1, Iterators.size(tasksAndEq.iterator()));
+    validateIncludesPartitionScan(tasksAndEq, 0);
+    TestTables.clearTables();
+  }

Review comment:
       Nit: Does this remove the `metadataDir` on exit / completion?

##########
File path: core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
##########
@@ -459,6 +460,65 @@ public void testDataFilesTableSelection() throws 
IOException {
     Assert.assertEquals(expected, scan.schema().asStruct());
   }
 
+  @Test
+  public void testPartitionColumnNamedPartition() throws Exception {
+    TestTables.clearTables();
+    this.tableDir = temp.newFolder();
+    tableDir.delete();
+
+    Schema schema = new Schema(
+        required(1, "id", Types.IntegerType.get()),
+        // create a column named "partition", to recreate the problem in
+        // https://github.com/apache/iceberg/issues/3709

Review comment:
       Nit: This comment feels a little redundant given the test name. Unlike 
the Spark repo, we don't usually reference the issue when adding a test (though 
I think that's a good practice in general).
   
   Maybe just `// See https://github.com/apache/iceberg/issues/3709` towards 
the top of the test if you feel it's necessary?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Reply via email to