kbendick commented on issue #2692:
URL: https://github.com/apache/iceberg/issues/2692#issuecomment-861269643


   As a starting point, for the Spark vectorized parquet reader, I think we 
should explicitly throw when we either
   - (1) encounter an encoding that’s not supported
   - (2) explicitly throw when we encounter a Parquet v2 file at read time.
   
   
   I think that approach 2 would potentially be simpler and more in line with 
the code from Spark, which has an explicit V1 path and V2 path for data pages, 
footers, etc (and which we modeled this class on).
   
   Spark 3.1.1 afaik does not support vectorized reading of files written with 
parquet v2 format, though it seems to be in the works.
   
   A more helpful error message would go a long way until we’ve updated the 
code to support vectorized reading of both parquet v1 write format and parquet 
v2 format in the Spark vectorized parquet reader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to