RussellSpitzer opened a new issue #3419:
URL: https://github.com/apache/iceberg/issues/3419
An example
```java
sql("CREATE OR REPLACE TABLE %s USING iceberg PARTITIONED BY (part) AS "
+
"SELECT id, data, CASE WHEN (id %% 2) = 0 THEN 'even' ELSE 'odd'
END AS part " +
"FROM %s ORDER BY 3, 1", tableName, sourceName);
Table table = Spark3Util.loadIcebergTable(spark, tableName);
assertEquals("Should have rows matching the source table",
sql("SELECT id, data, CASE WHEN (id %% 2) = 0 THEN 'even' ELSE
'odd' END AS part " +
"FROM %s ORDER BY id", sourceName),
sql("SELECT * FROM %s ORDER BY id", tableName));
table.updateSpec().removeField("part").commit();
sql("CREATE OR REPLACE TABLE %s USING iceberg PARTITIONED BY (part) AS "
+
"SELECT 2 * id as id, data, CASE WHEN ((2 * id) %% 2) = 0 THEN
'even' ELSE 'odd' END AS part " +
"FROM %s ORDER BY 3, 1", tableName, sourceName);
```
Fails with
```
Cannot use partition name more than once: part
java.lang.IllegalArgumentException: Cannot use partition name more than
once: part
at
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
at
org.apache.iceberg.PartitionSpec$Builder.checkAndAddPartitionName(PartitionSpec.java:353)
```
Although this should be impossible since our new replacement table should
have no knowledge of the old partitioning spec
This also manifests in a bunch of other odd errors, for example if we start
with the replacement table unpartitioned, it still has partition spec details
in it from the previous table.
So for example say I instead replace like
```
sql("CREATE OR REPLACE TABLE %s USING iceberg AS " +
"SELECT 2 * id as id, data, CASE WHEN ((2 * id) %% 2) = 0 THEN
'even' ELSE 'odd' END AS part " +
"FROM %s ORDER BY 3, 1", tableName, sourceName);
```
If we check our spec it contains
```
[
1000: part: void(3)
]
```
I've noticed this can crop up in a lot of weird ways for example a user
showed me a table which had been created using CTAS and ended up with the
following error
```
Caused by: org.apache.iceberg.exceptions.ValidationException: Conflicting
partition fields: ['1000: X_ID: identity(2)', '1000: X_bucket: bucket[1](1)']
```
Where it looks like the initial partition spec identifier got reset and then
allowed him to add two transforms with the same ID.
I'm pretty sure the issue is inside the replace/create catalog code. I
actually debugged this a month or two ago but apparently forgot to make an
issue or log it...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]