[GitHub] [iceberg] samarthjain commented on pull request #2740: [Parquet] Throw Better Exception with Vectorized Parquet V2 Format

GitBox Fri, 25 Jun 2021 10:25:58 -0700


samarthjain commented on pull request #2740:
URL: https://github.com/apache/iceberg/pull/2740#issuecomment-868719981



   @RussellSpitzer - I am hoping we can find a better solution here. I am 
generally not a fan of catching NPEs :) 
   
   There are a few other approaches possible here:
   1) Parquet v2 actually isn't that well tested. The later versions of Trino 
though have started writing parquet files in V2 format. We encountered this 
issue in Iceberg vectorized reads when we upgraded our Presto clusters to trino 
350 release. We worked around the issue by reintroducing the older parquet 
write path in Trino that writes Parquet V1 files.
    
   2) To fix this in Iceberg 
    - We should either look into supporting vectorized reads for v2
    - We should disable vectorized reads when/if we can detect that the parquet 
files are in V2 format.
    
   I can take up looking into 2). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] samarthjain commented on pull request #2740: [Parquet] Throw Better Exception with Vectorized Parquet V2 Format

Reply via email to