Hi Jorge,
I agree, I don't think you can decode the actual binary data without doing
some level of decoding the lengths.  It seems possible to skip over some
decoding by only encoding the block header data and and the bitdwidths in
each miniblock to skip to the next block.

Cheers,
-Micah



On Thu, Jan 27, 2022 at 10:33 AM Jorge Cardoso Leitão <
[email protected]> wrote:

> Hi,
>
> Say we have a page of 4 elements that are "DELTA_LENGTH_BYTE_ARRAY" and we
> would like to create 2 arrays of 2 elements each in a streaming fashion.
>
> I can't find a process over which we can do this without first reading all
> the encoded lengths.
> This is because I have been unable to find a way to identify where the
> "values" part starts.
>
> Using the example in the spec [1]: DeltaEncoding(5, 5, 6, 6)
> "HelloWorldFoobarABCDEF"
>
> say they are encoded as bytes as `[a,b,c,d,e, ...]["H", "e", ...]`. How can
> we find the position between ] and [?
>
> Best,
> Jorge
>
> [1]
>
> https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-length-byte-array-delta_length_byte_array--6
>

Reply via email to