[GitHub] [iceberg] adnanhb opened a new issue #2192: Cannot write incompatible dataset to table with schema error for list types

GitBox Mon, 01 Feb 2021 13:27:07 -0800


adnanhb opened a new issue #2192:
URL: https://github.com/apache/iceberg/issues/2192



   Hello,
   I have a spark application which writes parquet files and then reads them 
into an iceberg table using spark. This spark job fails with this error: 
elements should be required, but are optional.
   
   The schema I am using has 2 primitive fields and one array containing a 
record of a single primitive type.
   
   After a little searching, i came across a past issue which is very similar: 
[https://github.com/apache/iceberg/issues/510](url). I think this fixed the 
nullability check at the field level but not for lists or any other complex 
type (like Map). 
   
   I have written a test which reproduces the error. It can be found here: 
[https://gist.github.com/adnanhb/3348687751643e6034158ca5a0c47832](url)
   
   The stacktrace of the error is as follows:
   
   ```
   Problems:
   * entitlements: elements should be required, but are optional
           at 
org.apache.iceberg.types.TypeUtil.validateWriteSchema(TypeUtil.java:263)
           at 
org.apache.iceberg.spark.source.SparkWriteBuilder.buildForBatch(SparkWriteBuilder.java:103)
           at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:259)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:54)
           at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
           at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
           at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
           at 
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
           at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)
           at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:944)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
           at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:944)
   ```
   
   I am thinking that we just need to add checkNullability check (added as part 
of issue 510) for list types (in CheckCompatibility.java line 164). If that is 
agreed, i can try to submit a PR for it.
   
   Thanks.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] adnanhb opened a new issue #2192: Cannot write incompatible dataset to table with schema error for list types

Reply via email to