shawnding opened a new issue #729: SparkBatchWrite partition write should correctly row grouped URL: https://github.com/apache/incubator-iceberg/issues/729 While create a table use this statement: `CREATE TABLE test(id Int, data String) PARTITIONED BY (data)` When use the SparkBatchWrite write the data into iceberg like that: ``` String Base = ""; for( int i = 0; i < 5000; i ++ ) { Random rnd = new Random(); char c = (char) (rnd.nextInt(26) + 'a'); Base = Base + "(" + i + ", '" + c + "'),"; } spark.sql("INSERT INTO " + CATALOG_DB_TABLE + " VALUES " + Base + "(1, 'a')"); ``` The String `Base` cannot guarantee `data` grouped, so iceberg throw a `IllegalStateException` in this code: ``` if (completedPartitions.contains(key)) { // if rows are not correctly grouped, detect and fail the write PartitionKey existingKey = Iterables.find(completedPartitions, key::equals, null); LOG.warn("Duplicate key: {} == {}", existingKey, key); throw new IllegalStateException("Already closed files for partition: " + key.toPath()); } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
