shawnding opened a new issue #729: SparkBatchWrite partition write should 
correctly row grouped
URL: https://github.com/apache/incubator-iceberg/issues/729
 
 
   While create a table use this statement:
   `CREATE TABLE test(id Int, data String) PARTITIONED BY (data)`
   
   When use the SparkBatchWrite write the data into iceberg like that:
   ```
   String Base = "";
   
   for( int i = 0; i < 5000; i ++ ) {
     Random rnd = new Random();
     char c = (char) (rnd.nextInt(26) + 'a');
     Base = Base + "(" + i + ", '" + c + "'),";
   }
   
   spark.sql("INSERT INTO " + CATALOG_DB_TABLE + " VALUES " + Base + "(1, 
'a')");
   ```
   The  String `Base` cannot guarantee `data` grouped,  so iceberg throw a 
`IllegalStateException` in this code: 
   
   ```
   if (completedPartitions.contains(key)) {
     // if rows are not correctly grouped, detect and fail the write
      PartitionKey existingKey = Iterables.find(completedPartitions, 
key::equals, null);
      LOG.warn("Duplicate key: {} == {}", existingKey, key);
     throw new IllegalStateException("Already closed files for partition: " + 
key.toPath());
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to