Attendees: al...@influxdata.com: Andrew, Influx data, Saying Hi!, not causing trouble (he says)
Micah: Google, saying Hi! Causing trouble (he says) Raul: Arrow C++/Py release manager Julien: Datadog Notes: - Updates - Variant: iterating in parquet-format repo - Footer metadata rewrite - Need to review individual optimizations - Relative indices: - Trade offs: - Pro: Smaller metadata - Con: More complex computation - Do we need 2 layers of metadata? - Modular vs one footer? - How much does the footer size matter? - Encodings - How to add future-proof encodings - Andrew suggested Wasm based plugins: - Expressivity? - Security? - Could be great for experimenting with new encoding. - Not great for having a standard fully defined format. - Query engines do integrate decoding with evaluation. - Would need a clear contract: - skip(n) - decode_n_values_to _arrow - … - Storing plugin in file? => security issues - Independent discussion of adding a few encodings - good Integer encodings for timestamps. - Good encodings for floats - … - Compelling new encodings papers: - Fastlane paper: 10x improvement? - BTR blocks: a way to cascade encodings in a better way. On Wed, Oct 23, 2024 at 8:33 AM Julien Le Dem <jul...@apache.org> wrote: > The next Parquet sync is today Oct 23rd at 9:30am PT - 12:30pm ET - 6:30pm > CET > (in ~ 1h) > To join the invite: > https://calendar.app.google/GjNGkjfMYyoBUpaGA > Please contact me to be added to the recurring invite. > Everybody is welcome, bring your topic or just listen in. > Best > Julien >