[GitHub] [iceberg] kbendick commented on a change in pull request #3723: Fix Iceberg's parquet writer returning nulls incorrectly for parquet files written by writers that don't use list and element as names.

GitBox Mon, 13 Dec 2021 11:38:55 -0800


kbendick commented on a change in pull request #3723:
URL: https://github.com/apache/iceberg/pull/3723#discussion_r768064340




##########
File path: site/docs/spark-procedures.md
##########
@@ -334,6 +334,10 @@ CALL catalog_name.system.rewrite_manifests('db.sample', 
false)
 
 The `snapshot` and `migrate` procedures help test and migrate existing Hive or 
Spark tables to Iceberg.
 
+**Note** Parquet files written with Parquet writers that use names other than 
`list` and `element` for repeated group
+and element of the list respectively are **read incorrectly** by Iceberg upto 
0.12.1 Iceberg versions. Parquet files
+generated by Hive fall in this category.

Review comment:
       I checked 3.x and it's still called `bag`.
   
   I suggest we consider going with the something similar to the map projection 
element  rename fix that Ryan applied for maps from Parquet 1.11.0 to 1.11.1, 
where we don't depend on the name at all.
   
   PR for context: https://github.com/apache/iceberg/pull/3309
   
   It might not work because that was for projection, but if we can get 
something like that where we don't assume names, that would ideally allow for N 
levels of nesting etc. cc @SinghAsDev 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #3723: Fix Iceberg's parquet writer returning nulls incorrectly for parquet files written by writers that don't use list and element as names.

Reply via email to