adnanhb opened a new issue #2192: URL: https://github.com/apache/iceberg/issues/2192
Hello, I have a spark application which writes parquet files and then reads them into an iceberg table using spark. This spark job fails with this error: elements should be required, but are optional. The schema I am using has 2 primitive fields and one array containing a record of a single primitive type. After a little searching, i came across a past issue which is very similar: [https://github.com/apache/iceberg/issues/510](url). I think this fixed the nullability check at the field level but not for lists or any other complex type (like Map). I have written a test which reproduces the error. It can be found here: [https://gist.github.com/adnanhb/3348687751643e6034158ca5a0c47832](url) The stacktrace of the error is as follows: ``` Problems: * entitlements: elements should be required, but are optional at org.apache.iceberg.types.TypeUtil.validateWriteSchema(TypeUtil.java:263) at org.apache.iceberg.spark.source.SparkWriteBuilder.buildForBatch(SparkWriteBuilder.java:103) at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:259) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:54) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:944) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:944) ``` I am thinking that we just need to add checkNullability check (added as part of issue 510) for list types (in CheckCompatibility.java line 164). If that is agreed, i can try to submit a PR for it. Thanks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
