mehtaashish23 opened a new issue #886: Out of order fields commit fails in 
Iceberg
URL: https://github.com/apache/incubator-iceberg/issues/886
 
 
   Currently writing to the Iceberg table requires the input data to be ordered 
the same way it's ordered in the table's schema. For instance in gist below, 
table has field `f1, f2` registered, and if I try to write a data frame having 
field as `f2,f1` then it fails with below error. This becomes more complex when 
you have complex Nested fields, where the order of child fields can be anything.
   
   In some scenarios (like Spark-based ETL process), where you get data frame 
from a dependent system/component, it's out of the scope for writer code to 
re-order field to align with Iceberg, before writing to it.  NOTE: There is no 
API on dataFrame, wherein you can re-order the fields in the order, Iceberg 
expects.
   
   Expectation: The client shouldn't have to re-order the field before writing 
to Iceberg.
   
   GIST for reference: 
https://gist.github.com/mehtaashish23/2c694890931283ff3add2711bf57c1b0
   
   Error:
   ```Problems:
   f2 is out of order, before f1
     at 
org.apache.iceberg.spark.source.IcebergSource.validateWriteSchema(IcebergSource.java:208)
     at 
org.apache.iceberg.spark.source.IcebergSource.createWriter(IcebergSource.java:104)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:254)
     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
     ... 65 elided```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to