Micah, would it make sense to start a google doc specifically to discuss: - the goals (there could be a few subsets) - the candidate encodings - the existing/future prototypes to validate candidates.
On Thu, May 30, 2024 at 3:14 AM Steve Loughran <ste...@cloudera.com.invalid> wrote: > be good for a benchmark to be targetable at cloud storage; local stores, > especially those with SSD, hide a lot of the costs of datalakes > > On Tue, 28 May 2024 at 07:17, Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > As a follow-up to the "V3" Discussions [1][2] I wanted to start a thread > on > > improvements to encodings. > > > > There are several areas to pursue here: > > 1. Curating a standard set of benchmarks and criteria for determining > if a > > new encoding is worth adding. > > 2. Developing new encodings > > 3. Better implementations to select existing encodings. > > 4. Better support for encodings with point/indexed lookups. > > 5. Benchmarking frameworks that allow assessing trade-off of encodings > on > > storage systems with different latency/throughput. > > > > Realistically, given my current commitments, I don't think I have > bandwidth > > to help with this track in the near term. If someone else would like to > > help drive this and make concrete proposals in these areas it would be > > greatly appreciated. > > > > Thanks, > > Micah > > > > > > [1] https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo > > [2] > > > > > https://docs.google.com/document/d/19hQLYcU5_r5nJB7GtnjfODLlSDiNS24GXAtKg9b0_ls/edit > > >