Hi Joseph. DELTA_LENGTH_BYTE_ARRAY encoding is a parquet writer v2 feature.
I do not believe that the Spark ParquetReaders implement page v2 at the moment (this might be what you mean when you say you’re working on it). For files generated with DELTA_BYTE_ARRAY_ENCODING (e.g. from Tino), I’ve been able to get around it by disabling vectorized parquet reads. This has solved the immediate problem of making the files readable for me, though of course it comes with the disadvantage of not getting vectorized reads. Try changing this setting to false: spark.sql.iceberg.vectorization.enabled DELTA_LENGTH_BYTE_ARRAY encoding is a parquet writer v2 feature. I do not believe that the Spark ParquetReaders implement page v2 at the moment (this might be what you mean when you say you’re working on it). Here’s an issue that should get you some more insight: https://github.com/apache/iceberg/issues/2692 Let me know if that answers your question! Kyle Bendickson Software Engineer Apple ACS Data One Apple Park Way, Cupertino, CA 95014, USA [email protected] This email and any attachments may be privileged and may contain confidential information intended only for the recipient(s) named above. Any other distribution, forwarding, copying or disclosure of this message is strictly prohibited. If you have received this email in error, please notify me immediately by telephone or return email, and delete this message from your system. > On Jul 19, 2021, at 9:28 PM, Jorge Cardoso Leitão <[email protected]> > wrote: > > Hi, > > I am trying to add support for DELTA_LENGTH_BYTE_ARRAY in a package, but I > am struggling to find readers of it, despite the fact that the spec states > "This encoding is always preferred over PLAIN for byte array columns.". > > * spark 3.X: Unsupported encoding: DELTA_LENGTH_BYTE_ARRAY > * pyarrow 4: OSError: Not yet implemented: Unsupported encoding. > > Is there any minimal preferred encodings or people just ignore encodings > and use either PLAIN or dict? Or are the encodings just not so much > supported because they do bring sufficient benefits? > > Could someone offer some context to the situation? > > Best, > Jorge
