Hi Joseph.

DELTA_LENGTH_BYTE_ARRAY encoding is a parquet writer v2 feature.

I do not believe that the Spark ParquetReaders implement page v2 at the moment 
(this might be what you mean when you say you’re working on it).

For files generated with DELTA_BYTE_ARRAY_ENCODING (e.g. from Tino), I’ve been 
able to get around it by disabling vectorized parquet reads. This has solved 
the immediate problem of making the files readable for me, though of course it 
comes with the disadvantage of not getting vectorized reads.

Try changing this setting to false: spark.sql.iceberg.vectorization.enabled

DELTA_LENGTH_BYTE_ARRAY encoding is a parquet writer v2 feature. I do not 
believe that the Spark ParquetReaders implement page v2 at the moment (this 
might be what you mean when you say you’re working on it).

Here’s an issue that should get you some more insight: 
https://github.com/apache/iceberg/issues/2692

Let me know if that answers your question!


Kyle Bendickson
Software Engineer
Apple
ACS Data
One Apple Park Way,
Cupertino, CA 95014, USA
[email protected]

This email and any attachments may be privileged and may contain confidential 
information intended only for the recipient(s) named above. Any other 
distribution, forwarding, copying or disclosure of this message is strictly 
prohibited. If you have received this email in error, please notify me 
immediately by telephone or return email, and delete this message from your 
system.


> On Jul 19, 2021, at 9:28 PM, Jorge Cardoso Leitão <[email protected]> 
> wrote:
> 
> Hi,
> 
> I am trying to add support for DELTA_LENGTH_BYTE_ARRAY in a package, but I
> am struggling to find readers of it, despite the fact that the spec states
> "This encoding is always preferred over PLAIN for byte array columns.".
> 
> * spark 3.X: Unsupported encoding: DELTA_LENGTH_BYTE_ARRAY
> * pyarrow 4: OSError: Not yet implemented: Unsupported encoding.
> 
> Is there any minimal preferred encodings or people just ignore encodings
> and use either PLAIN or dict? Or are the encodings just not so much
> supported because they do bring sufficient benefits?
> 
> Could someone offer some context to the situation?
> 
> Best,
> Jorge

Reply via email to