kbendick commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r779880513
##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
}
private void checkAndAddPartitionName(String name, Integer sourceColumnId)
{
+ checkAndAddPartitionName(name, sourceColumnId, true);
+ }
+
+ private void checkAndAddPartitionName(String name, Integer sourceColumnId,
boolean checkConflict) {
Types.NestedField schemaField = schema.findField(name);
- if (sourceColumnId != null) {
- // for identity transform case we allow conflicts between partition
and schema field name as
- // long as they are sourced from the same schema field
- Preconditions.checkArgument(schemaField == null ||
schemaField.fieldId() == sourceColumnId,
- "Cannot create identity partition sourced from different field in
schema: %s", name);
- } else {
- // for all other transforms we don't allow conflicts between partition
name and schema field name
- Preconditions.checkArgument(schemaField == null,
- "Cannot create partition from name that exists in schema: %s",
name);
+ if (checkConflict) {
Review comment:
Agreed it does look cleaner in the builder!
Question for my own full understanding: Is it correct to say that we can
ignore `checkConflict` when we know there is no conflict? Like in the
transformSpec, we can set `checkConflict` to `false` because we're already
adding `partitionPrefix`?
##########
File path: core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
##########
@@ -459,6 +460,65 @@ public void testDataFilesTableSelection() throws
IOException {
Assert.assertEquals(expected, scan.schema().asStruct());
}
+ @Test
+ public void testPartitionColumnNamedPartition() throws Exception {
+ TestTables.clearTables();
+ this.tableDir = temp.newFolder();
+ tableDir.delete();
+
+ Schema schema = new Schema(
+ required(1, "id", Types.IntegerType.get()),
+ // create a column named "partition", to recreate the problem in
+ // https://github.com/apache/iceberg/issues/3709
+ required(2, "partition", Types.IntegerType.get())
+ );
+ this.metadataDir = new File(tableDir, "metadata");
+ PartitionSpec spec = PartitionSpec.builderFor(schema)
+ .identity("partition")
+ .build();
+
+ DataFile par0 = DataFiles.builder(spec)
+ .withPath("/path/to/data-0.parquet")
+ .withFileSizeInBytes(10)
+ .withPartition(TestHelpers.Row.of(0))
+ .withRecordCount(1)
+ .build();
+ DataFile par1 = DataFiles.builder(spec)
+ .withPath("/path/to/data-0.parquet")
+ .withFileSizeInBytes(10)
+ .withPartition(TestHelpers.Row.of(1))
+ .withRecordCount(1)
+ .build();
+ DataFile par2 = DataFiles.builder(spec)
+ .withPath("/path/to/data-0.parquet")
+ .withFileSizeInBytes(10)
+ .withPartition(TestHelpers.Row.of(2))
+ .withRecordCount(1)
+ .build();
+
+ this.table = create(schema, spec);
+ table.newFastAppend()
+ .appendFile(par0)
+ .commit();
+ table.newFastAppend()
+ .appendFile(par1)
+ .commit();
+ table.newFastAppend()
+ .appendFile(par2)
+ .commit();
+
+ Table partitionsTable = new PartitionsTable(table.ops(), table);
+
+ Expression andEquals = Expressions.and(
+ Expressions.equal("partition.partition", 0),
+ Expressions.greaterThan("record_count", 0));
+ TableScan scanAndEq = partitionsTable.newScan().filter(andEquals);
+ CloseableIterable<FileScanTask> tasksAndEq =
PartitionsTable.planFiles((StaticTableScan) scanAndEq);
+ Assert.assertEquals(1, Iterators.size(tasksAndEq.iterator()));
+ validateIncludesPartitionScan(tasksAndEq, 0);
+ TestTables.clearTables();
+ }
Review comment:
Nit: Does this remove the `metadataDir` on exit / completion?
##########
File path: core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
##########
@@ -459,6 +460,65 @@ public void testDataFilesTableSelection() throws
IOException {
Assert.assertEquals(expected, scan.schema().asStruct());
}
+ @Test
+ public void testPartitionColumnNamedPartition() throws Exception {
+ TestTables.clearTables();
+ this.tableDir = temp.newFolder();
+ tableDir.delete();
+
+ Schema schema = new Schema(
+ required(1, "id", Types.IntegerType.get()),
+ // create a column named "partition", to recreate the problem in
+ // https://github.com/apache/iceberg/issues/3709
Review comment:
Nit: This comment feels a little redundant given the test name. Unlike
the Spark repo, we don't usually reference the issue when adding a test (though
I think that's a good practice in general).
Maybe just `// See https://github.com/apache/iceberg/issues/3709` towards
the top of the test if you feel it's necessary?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]