mehtaashish23 opened a new issue #886: Out of order fields commit fails in Iceberg URL: https://github.com/apache/incubator-iceberg/issues/886 Currently writing to the Iceberg table requires the input data to be ordered the same way it's ordered in the table's schema. For instance in gist below, table has field `f1, f2` registered, and if I try to write a data frame having field as `f2,f1` then it fails with below error. This becomes more complex when you have complex Nested fields, where the order of child fields can be anything. In some scenarios (like Spark-based ETL process), where you get data frame from a dependent system/component, it's out of the scope for writer code to re-order field to align with Iceberg, before writing to it. NOTE: There is no API on dataFrame, wherein you can re-order the fields in the order, Iceberg expects. Expectation: The client shouldn't have to re-order the field before writing to Iceberg. GIST for reference: https://gist.github.com/mehtaashish23/2c694890931283ff3add2711bf57c1b0 Error: ```Problems: f2 is out of order, before f1 at org.apache.iceberg.spark.source.IcebergSource.validateWriteSchema(IcebergSource.java:208) at org.apache.iceberg.spark.source.IcebergSource.createWriter(IcebergSource.java:104) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:254) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228) ... 65 elided```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
