I'm interested in experimenting and implementing new encodings.
Will follow up with concrete proposals or findings.

Best,
Gang

On Thu, May 30, 2024 at 3:29 AM Ed Seidl <etse...@live.com> wrote:

> Maybe this is putting the cart too far in front of the horse, but I'd be
> willing to implement an encoding like this to see if is a better
> alternative to PLAIN and DELTA_LENGTH_BYTE_ARRAY as a dictionary
> fallback for byte arrays, at least for GPU decoding. We might want to
> change the name since it wouldn't be used exclusively for random access
> any longer. Maybe LENGTH_BYTE_ARRAY? Or PLAIN_BYTE_ARRAY?
>
> I'll also raise my hand as interested in participating in all 5 of the
> tasks outlined, as time permits.
>
> Cheers,
> Ed
>
> On 5/28/24 11:05 PM, Micah Kornfield wrote:
> > BTW, I did propose a new RANDOM_ACCESS_BYTE_ARRAY encoding (effectively
> > Arrow's representation) as part footer improvements [1] to help allow for
> > O(1) access to particular column metadata, once a column is identified.
> >
> > [1] https://github.com/apache/parquet-format/pull/250
> >
> > On Mon, May 27, 2024 at 11:16 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> >> As a follow-up to the "V3" Discussions [1][2] I wanted to start a thread
> >> on improvements to encodings.
> >>
> >> There are several areas to pursue here:
> >> 1.  Curating a standard set of benchmarks and criteria for determining
> if
> >> a new encoding is worth adding.
> >> 2.  Developing new encodings
> >> 3.  Better implementations to select existing encodings.
> >> 4.  Better support for encodings with point/indexed lookups.
> >> 5.  Benchmarking frameworks that allow assessing trade-off of encodings
> on
> >> storage systems with different latency/throughput.
> >>
> >> Realistically, given my current commitments, I don't think I have
> >> bandwidth to help with this track in the near term. If someone else
> would
> >> like to help drive this and make concrete proposals in these areas it
> would
> >> be greatly appreciated.
> >>
> >> Thanks,
> >> Micah
> >>
> >>
> >> [1] https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo
> >> [2]
> >>
> https://docs.google.com/document/d/19hQLYcU5_r5nJB7GtnjfODLlSDiNS24GXAtKg9b0_ls/edit
> >>
>
>

Reply via email to