I'm interested in experimenting and implementing new encodings. Will follow up with concrete proposals or findings.
Best, Gang On Thu, May 30, 2024 at 3:29 AM Ed Seidl <etse...@live.com> wrote: > Maybe this is putting the cart too far in front of the horse, but I'd be > willing to implement an encoding like this to see if is a better > alternative to PLAIN and DELTA_LENGTH_BYTE_ARRAY as a dictionary > fallback for byte arrays, at least for GPU decoding. We might want to > change the name since it wouldn't be used exclusively for random access > any longer. Maybe LENGTH_BYTE_ARRAY? Or PLAIN_BYTE_ARRAY? > > I'll also raise my hand as interested in participating in all 5 of the > tasks outlined, as time permits. > > Cheers, > Ed > > On 5/28/24 11:05 PM, Micah Kornfield wrote: > > BTW, I did propose a new RANDOM_ACCESS_BYTE_ARRAY encoding (effectively > > Arrow's representation) as part footer improvements [1] to help allow for > > O(1) access to particular column metadata, once a column is identified. > > > > [1] https://github.com/apache/parquet-format/pull/250 > > > > On Mon, May 27, 2024 at 11:16 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > >> As a follow-up to the "V3" Discussions [1][2] I wanted to start a thread > >> on improvements to encodings. > >> > >> There are several areas to pursue here: > >> 1. Curating a standard set of benchmarks and criteria for determining > if > >> a new encoding is worth adding. > >> 2. Developing new encodings > >> 3. Better implementations to select existing encodings. > >> 4. Better support for encodings with point/indexed lookups. > >> 5. Benchmarking frameworks that allow assessing trade-off of encodings > on > >> storage systems with different latency/throughput. > >> > >> Realistically, given my current commitments, I don't think I have > >> bandwidth to help with this track in the near term. If someone else > would > >> like to help drive this and make concrete proposals in these areas it > would > >> be greatly appreciated. > >> > >> Thanks, > >> Micah > >> > >> > >> [1] https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo > >> [2] > >> > https://docs.google.com/document/d/19hQLYcU5_r5nJB7GtnjfODLlSDiNS24GXAtKg9b0_ls/edit > >> > >