RussellSpitzer opened a new issue #3419:
URL: https://github.com/apache/iceberg/issues/3419


   An example
   
   ```java
       sql("CREATE OR REPLACE TABLE %s USING iceberg PARTITIONED BY (part) AS " 
+
               "SELECT id, data, CASE WHEN (id %% 2) = 0 THEN 'even' ELSE 'odd' 
END AS part " +
               "FROM %s ORDER BY 3, 1", tableName, sourceName);
   
       Table table  = Spark3Util.loadIcebergTable(spark, tableName);
   
       assertEquals("Should have rows matching the source table",
               sql("SELECT id, data, CASE WHEN (id %% 2) = 0 THEN 'even' ELSE 
'odd' END AS part " +
                       "FROM %s ORDER BY id", sourceName),
               sql("SELECT * FROM %s ORDER BY id", tableName));
   
       table.updateSpec().removeField("part").commit();
   
       sql("CREATE OR REPLACE TABLE %s USING iceberg PARTITIONED BY (part) AS " 
+
               "SELECT 2 * id as id, data, CASE WHEN ((2 * id) %% 2) = 0 THEN 
'even' ELSE 'odd' END AS part " +
               "FROM %s ORDER BY 3, 1", tableName, sourceName);
    ```
    
    Fails with
    
    ```
    Cannot use partition name more than once: part
   java.lang.IllegalArgumentException: Cannot use partition name more than 
once: part
        at 
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
        at 
org.apache.iceberg.PartitionSpec$Builder.checkAndAddPartitionName(PartitionSpec.java:353)
   ```
   
   Although this should be impossible since our new replacement table should 
have no knowledge of the old partitioning spec
   
   This also manifests in a bunch of other odd errors, for example if we start 
with the replacement table unpartitioned, it still has partition spec details 
in it from the previous table.
   
   So for example say I instead replace like
   
   ```
       sql("CREATE OR REPLACE TABLE %s USING iceberg AS " +
               "SELECT 2 * id as id, data, CASE WHEN ((2 * id) %% 2) = 0 THEN 
'even' ELSE 'odd' END AS part " +
               "FROM %s ORDER BY 3, 1", tableName, sourceName);
   ```
   
   If we check our spec it contains
   
   ```
   [
     1000: part: void(3)
   ]
   ```
   
   I've noticed this can crop up in a lot of weird ways for example a user 
showed me a table which had been created using CTAS and ended up with the 
following error
   
   ```
   Caused by: org.apache.iceberg.exceptions.ValidationException: Conflicting 
partition fields: ['1000: X_ID: identity(2)', '1000: X_bucket: bucket[1](1)']
   ```
   
   Where it looks like the initial partition spec identifier got reset and then 
allowed him to add two transforms with the same ID.
   
   I'm pretty sure the issue is inside the replace/create catalog code. I 
actually debugged this a month or two ago but apparently forgot to make an 
issue or log it...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to