dmgcodevil edited a comment on issue #12597: URL: https://github.com/apache/arrow/issues/12597#issuecomment-1064360092
@jorisvandenbossche , the files that Pyarrow successfully reads written by Spark/Iceberg data source and Iceberg's [ParquetWriter](https://github.com/apache/iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java). The files that Pyarrow fails to read are written via Trino Iceberg catalog (connector). In theory, both Trino and Spark should use Iceberg ParquetWriter which internally uses Hadoop Parquet Writer. What I've found is that some columns are encoded in DELTA_BYTE_ARRAY. Does Pyarrow support this encoding? I know that Fastparquet does not. I also found this [ticket](https://issues.apache.org/jira/browse/ARROW-6057?src=confmacro), is it still relevant? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
