[GitHub] [arrow] dmgcodevil edited a comment on issue #12597: Does Pyarrow support Parquet 2.0

GitBox Thu, 10 Mar 2022 10:21:42 -0800


dmgcodevil edited a comment on issue #12597:
URL: https://github.com/apache/arrow/issues/12597#issuecomment-1064360092



   @jorisvandenbossche , the files that Pyarrow successfully reads written by 
Spark/Iceberg data source and Iceberg's 
[ParquetWriter](https://github.com/apache/iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java).
   
   The files that Pyarrow fails to read are written via Trino Iceberg catalog 
(connector). In theory, both Trino and Spark should use Iceberg ParquetWriter 
which internally uses Hadoop Parquet Writer. 
   
   What I've found is that some columns are encoded in DELTA_BYTE_ARRAY. Does 
Pyarrow support this encoding? I know that Fastparquet does not. I also found 
this [ticket](https://issues.apache.org/jira/browse/ARROW-6057?src=confmacro), 
is it still relevant? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] dmgcodevil edited a comment on issue #12597: Does Pyarrow support Parquet 2.0

Reply via email to