singhpk234 opened a new pull request, #5179: URL: https://github.com/apache/iceberg/pull/5179
### About the changes Address https://github.com/apache/iceberg/pull/5094#discussion_r902008258 Spark was writing 3 level list rather than 2 level list which was expected in the [UT](https://github.com/apache/iceberg/blob/7e1ade80397a54c24db3634cb6015d539143e62d/spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/actions/TestCreateActions.java#L640-L703). On debugging this more found that, since the schema was passed via `spark.read().schema(sparkSchema).json` and as of spark 3.3 spark will not respect the nullability in the schema passed via above by default (ref. [this](https://github.com/apache/iceberg/pull/5094)). Now since the nullability is not respected (will be considered nullable) by spark by default the Parquet writer despite `writeLegacyParquetFormat` being true, will write in Three level list. [CodePointer](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala#L376-L410) This pr adds the conf to respect the nullability provided presently and hence preserves the existing behaviour. P.S : A good long term fix would be to get rid of this form of specifying schema from our test / test utils. ---- ### Testing Done Re-enabled the UT, which was ignored in version upgrade. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
