Re: [DISCUSS] Encoding improvements (follow-up from Parquet "V3" discussion)

Julien Le Dem Fri, 31 May 2024 17:35:58 -0700

Micah, would it make sense to start a google doc specifically to discuss:
- the goals (there could be a few subsets)
- the candidate encodings
- the existing/future prototypes to validate candidates.


On Thu, May 30, 2024 at 3:14 AM Steve Loughran <[email protected]>
wrote:

> be good for a benchmark to be targetable at cloud storage; local stores,
> especially those with SSD, hide a lot of the costs of datalakes
>
> On Tue, 28 May 2024 at 07:17, Micah Kornfield <[email protected]>
> wrote:
>
> > As a follow-up to the "V3" Discussions [1][2] I wanted to start a thread
> on
> > improvements to encodings.
> >
> > There are several areas to pursue here:
> > 1.  Curating a standard set of benchmarks and criteria for determining
> if a
> > new encoding is worth adding.
> > 2.  Developing new encodings
> > 3.  Better implementations to select existing encodings.
> > 4.  Better support for encodings with point/indexed lookups.
> > 5.  Benchmarking frameworks that allow assessing trade-off of encodings
> on
> > storage systems with different latency/throughput.
> >
> > Realistically, given my current commitments, I don't think I have
> bandwidth
> > to help with this track in the near term. If someone else would like to
> > help drive this and make concrete proposals in these areas it would be
> > greatly appreciated.
> >
> > Thanks,
> > Micah
> >
> >
> > [1] https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo
> > [2]
> >
> >
> https://docs.google.com/document/d/19hQLYcU5_r5nJB7GtnjfODLlSDiNS24GXAtKg9b0_ls/edit
> >
>

Re: [DISCUSS] Encoding improvements (follow-up from Parquet "V3" discussion)

Reply via email to