[GitHub] [iceberg] kbendick commented on issue #2692: [Spark] NullPointerException error when attempting to do vectorized read of Parquet file with unsupported encoding

GitBox Tue, 15 Jun 2021 00:48:48 -0700


kbendick commented on issue #2692:
URL: https://github.com/apache/iceberg/issues/2692#issuecomment-861269643



   As a starting point, for the Spark vectorized parquet reader, I think we 
should explicitly throw when we either
   - (1) encounter an encoding that’s not supported
   - (2) explicitly throw when we encounter a Parquet v2 file at read time.
   
   
   I think that approach 2 would potentially be simpler and more in line with 
the code from Spark, which has an explicit V1 path and V2 path for data pages, 
footers, etc (and which we modeled this class on).
   
   Spark 3.1.1 afaik does not support vectorized reading of files written with 
parquet v2 format, though it seems to be in the works.
   
   A more helpful error message would go a long way until we’ve updated the 
code to support vectorized reading of both parquet v1 write format and parquet 
v2 format in the Spark vectorized parquet reader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on issue #2692: [Spark] NullPointerException error when attempting to do vectorized read of Parquet file with unsupported encoding

Reply via email to