[GitHub] [iceberg] singhpk234 opened a new pull request, #5179: Spark 3.3: Re-Enable TwoLevel List in Parquet UT

GitBox Fri, 01 Jul 2022 20:48:06 -0700


singhpk234 opened a new pull request, #5179:
URL: https://github.com/apache/iceberg/pull/5179


   ### About the changes
   
   Address https://github.com/apache/iceberg/pull/5094#discussion_r902008258
   Spark was writing 3 level list rather than 2 level list which was expected 
in the 
[UT](https://github.com/apache/iceberg/blob/7e1ade80397a54c24db3634cb6015d539143e62d/spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/actions/TestCreateActions.java#L640-L703).
 
   
   On debugging this more found that, since the schema was passed via  
`spark.read().schema(sparkSchema).json` and as of spark 3.3 spark will not 
respect the nullability in the schema passed via above by default (ref. 
[this](https://github.com/apache/iceberg/pull/5094)). 
   
   Now since the nullability is not respected (will be considered nullable) by 
spark by default the Parquet writer despite `writeLegacyParquetFormat` being 
true, will write in Three level list. 
[CodePointer](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala#L376-L410)
   
   This pr adds the conf to respect the nullability provided presently and 
hence preserves the existing behaviour. 
   
   P.S : A good long term fix would be to get rid of this form of specifying 
schema from our test / test utils.
   
   ----
   
   ### Testing Done
   Re-enabled the UT, which was ignored in version upgrade.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] singhpk234 opened a new pull request, #5179: Spark 3.3: Re-Enable TwoLevel List in Parquet UT

Reply via email to