I believe Iceberg also, does not support V2 encodings in general [1].

In terms of changing the default, it would be nice to see if we can close
the gap on the difference Rust is seeing vs Trino, to make sure we aren't
overly  pessimizing readers (and do some benchmarks on other
implementations).  I think think changing the default is something that
might be better as part of a major version release of parquet-mr.

This is a plug for the implementation matrix [2] that would be good to fill
out (it seems Trino should potentially be another column here).

[1] https://github.com/apache/iceberg/issues/11371
[2]
https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md

On Fri, Nov 29, 2024 at 3:56 AM Raunaq Morarka <raunaqmora...@gmail.com>
wrote:

> >. While everything supported reading parquet files with the default
> parquet-java settings, the support was significantly less complete
>
> Agreed, that was one of my motivations to push for this change in default
> in parquet-java, as that would be a better guarantee that
> using DELTA_LENGTH_BYTE_ARRAY in V1 page for BINARY would work more widely
> than just being in the parquet spec.
> I already found in my testing that such files are readable by Trino, Apache
> Hive and parquet cli but not by Apache Spark 3.4.2
> I filed https://issues.apache.org/jira/browse/SPARK-50457 for it. This is
> supposed to be supported by Spark since
> https://issues.apache.org/jira/browse/SPARK-40128
>
> On Fri, 29 Nov 2024 at 16:21, Andrew Lamb <andrewlam...@gmail.com> wrote:
>
> > > Does anything in the parquet format spec speak against using
> > DELTA_LENGTH_BYTE_ARRAY in V1 pages ? I didn't find anything in the spec
> to
> > confirm or deny this.
> >
> > My biggest suggestion when looking to use an encoding other than PLAIN is
> > to consider the ecosystem support. When working on InfluxDB 3.0 we found
> > that support across various tools that read parquet varied/
> >
> > While everything supported reading parquet files with the default
> > parquet-java settings, the support was significantly less complete. More
> > details on this thread[1].
> >
> > [1]: https://lists.apache.org/thread/tnxbykozo5owq2y36nw7lomr91hrdxhz
> >
> > On Thu, Nov 28, 2024 at 2:22 PM Raunaq Morarka <raunaqmora...@gmail.com>
> > wrote:
> >
> > > >  Skipping over values
> > >     similarly does not show major performance differences, as neither
> > >     encoding actually provides efficient random lookup, in both cases
> > >     requiring scanning through either N values or N lengths.
> > >
> > > With DELTA_LENGTH_BYTE_ARRAY, you can decode the lengths in a batch and
> > > compute the offset to skip to in the strings portion directly from
> that.
> > > See
> > >
> > >
> >
> https://github.com/trinodb/trino/blob/ae7d300f024d4f415dfc5f6f63d387418844a172/lib/trino-parquet/src/main/java/io/trino/parquet/reader/decoders/DeltaLengthByteArrayDecoders.java#L135
> > > for example.
> > > With PLAIN we're going back forth between decoding an int and using
> that
> > to
> > > skip to the next encoded length position.
> > >
> > > > Given the non-SIMD friendly way it encodes the length information I
> > would
> > > indeed expect it to be slower.
> > >
> > > While the encoding of the length values is not straightforward, it's
> > still
> > > possible to write batched routines for decoding bit-packed integers
> > > followed by prefix sum.
> > > E.g.
> > >
> > >
> >
> https://github.com/trinodb/trino/blob/ae7d300f024d4f415dfc5f6f63d387418844a172/lib/trino-parquet/src/main/java/io/trino/parquet/reader/decoders/DeltaPackingUtils.java#L52
> > >
> > > > Now I am not familiar with the Trino benchmark referred to, and it
> may
> > > be taking IO into account which would be impacted by overall data size
> > >
> > > The benchmark at
> > >
> > >
> >
> https://github.com/trinodb/trino/blob/7c6157343583bdad8933a899f7e8fd3ecd4de1a9/lib/trino-parquet/src/test/java/io/trino/parquet/reader/BenchmarkBinaryColumnReader.java#L44
> > > is not doing any IO.
> > > It generates the data for testing in-memory and runs the decoder on it.
> > It
> > > does not use any compression as well.
> > >
> > > I can see that the performance may come down to implementation details,
> > > especially whether there is the ability to avoid copies of the string
> for
> > > PLAIN encoding.
> > > Maybe it makes sense for Trino to change the default for its own
> writer.
> > > Does anything in the parquet format spec speak against using
> > > DELTA_LENGTH_BYTE_ARRAY in V1 pages ? I didn't find anything in the
> spec
> > to
> > > confirm or deny this.
> > >
> > > On Thu, 28 Nov 2024 at 23:56, Raphael Taylor-Davies
> > > <r.taylordav...@googlemail.com.invalid> wrote:
> > >
> > > > For what it is worth this performance disparity may not be a property
> > of
> > > > the encoding but instead the Java implementation. At least in
> arrow-rs
> > > > DELTA_LENGTH_BYTE_ARRAY is ~30% slower than PLAIN when reading data
> > from
> > > > memory. Given the non-SIMD friendly way it encodes the length
> > > > information I would indeed expect it to be slower. Skipping over
> values
> > > > similarly does not show major performance differences, as neither
> > > > encoding actually provides efficient random lookup, in both cases
> > > > requiring scanning through either N values or N lengths.
> > > >
> > > > Now I am not familiar with the Trino benchmark referred to, and it
> may
> > > > be taking IO into account which would be impacted by overall data
> size,
> > > > but I thought I'd provide another data point.
> > > >
> > > > I'd also add that many modern engines, e.g. DuckDB and Velox, use a
> > > > string encoding that avoids needing to copy the string data even when
> > > > the data is PLAIN encoded, and all arrow readers supporting the
> > > > ViewArray types can perform the same optimisation. Arrow-rs does
> this,
> > > > however, in the benchmark I was running the strings were relatively
> > > > short ~43 bytes and so the 30% performance hit of
> > > > DELTA_LENGTH_BYTE_ARRAY remained unchanged.
> > > >
> > > > Kind Regards,
> > > >
> > > > Raphael Taylor-Davies
> > > >
> > > > On 28/11/2024 17:23, Raunaq Morarka wrote:
> > > > > The current default for V1 pages is PLAIN encoding. This encoding
> > mixes
> > > > > string length with string data. This is inefficient for skipping N
> > > > values,
> > > > > as the encoding does not allow random access. It's also slow to
> > decode
> > > as
> > > > > the interleaving of lengths with data does not allow efficient
> > batched
> > > > > implementations and forces most implementations to make copies of
> the
> > > > data
> > > > > to fit the usual representation of separate offsets and data for
> > > strings.
> > > > >
> > > > > DELTA_LENGTH_BYTE_ARRAY has none of the above problems as it
> > separates
> > > > > offsets and data. The parquet-format spec also seems to recommend
> > this
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/c70281359087dfaee8bd43bed9748675f4aabe11/Encodings.md?plain=1#L299
> > > > >
> > > > > ### Delta-length byte array: (DELTA_LENGTH_BYTE_ARRAY = 6)
> > > > >
> > > > > Supported Types: BYTE_ARRAY
> > > > >
> > > > > This encoding is always preferred over PLAIN for byte array
> columns.
> > > > >
> > > > > V2 pages use DELTA_BYTE_ARRAY as the default encoding, this is an
> > > > > improvement over PLAIN but adds complexity which makes it slower to
> > > > decode
> > > > > than DELTA_LENGTH_BYTE_ARRAY with the potential benefit of lower
> > > storage
> > > > > requirements.
> > > > >
> > > > > JMH benchmarks in Trino's parquet reader at
> > > > > io.trino.parquet.reader.BenchmarkBinaryColumnReader showed that
> > > > > DELTA_LENGTH_BYTE_ARRAY can be decoded at over 5X speed and
> > > > > DELTA_BYTE_ARRAY at over 2X the speed of decoding PLAIN encoding.
> > > > > Given the above recommendation of parquet-format spec and
> significant
> > > > > performance difference, I'm proposing updating parquet-java to use
> > > > > DELTA_LENGTH_BYTE_ARRAY instead of PLAIN by default for V1 pages.
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Raunaq Morarka,
> > > Email: raunaqmora...@gmail.com
> > >
> >
>
>
> --
> Regards,
> Raunaq Morarka,
> Email: raunaqmora...@gmail.com
>

Reply via email to