Re: Parquet sync May 28th 2025

Micah Kornfield Fri, 06 Jun 2025 21:45:23 -0700

Hi Selim,
Sorry its not clear from the notes, what was your question?

Thanks,
Micah


On Sat, May 31, 2025 at 9:54 PM Selim S <[email protected]> wrote:

> Hi - unfortunately my question is not answered until now.
>
> Le sam. 31 mai 2025 à 01:37, Julien Le Dem <[email protected]> a écrit :
>
> > Attendees:
> >
> >    -
> >
> >    Julien: Datadog
> >    -
> >
> >    Alex: Google, listening in
> >    -
> >
> >    Aditya: CMU Variant
> >    -
> >
> >    Alkis: Databricks, new footer update
> >    -
> >
> >    Andrew Lamb: Influx Data. listen
> >    -
> >
> >    Brian: Google
> >    -
> >
> >    Dewey: geometry
> >    -
> >
> >    Talat: Google.
> >    -
> >
> >    Jeff: Snowflake. Footer, encoding
> >    -
> >
> >    Martin: CMU variant
> >    -
> >
> >    Mengmeng: snowflake
> >    -
> >
> >    Prashant: snowflake
> >    -
> >
> >    Prateek: snowflake scan team. Encodings, exp
> >    -
> >
> >    Russel: Snowflake. Duck
> >    -
> >
> >    Sai: snowflake
> >    -
> >
> >    Sandieep: snowflake
> >    -
> >
> >    Selcuk: snowflake
> >    -
> >
> >    Selim: detection partition field, java api.modification date?
> >    -
> >
> >    Thomas: snowflake: floating type proposal.
> >    -
> >
> >    Vinoo: startup
> >    -
> >
> >    Micah: Databricks.
> >
> >
> > Agenda/Notes:
> >
> >    -
> >
> >    New footer update (Alkis):
> >    -
> >
> >       Prototype running in databricks:
> >       -
> >
> >          https://github.com/apache/arrow/pull/43793
> >          -
> >
> >       Deserialization results:
> >       -
> >
> >          For compatibility: generate the thrift data structure from the
> >          flatbuffer.
> >          -
> >
> >          Deserialization speed improved by 30% on average
> >          -
> >
> >          Pathological cases 5 to 10x speedup
> >          -
> >
> >          Some cases have regression (20-30%)
> >          -
> >
> >             Need to figure out why and fix it.
> >             -
> >
> >          Expect better fetch optimization because the footer is smaller.
> >          -
> >
> >          This is all without taking advantage of:
> >          -
> >
> >             Partial deserialization
> >             -
> >
> >             Not having to produce the thrift
> >             -
> >
> >       Do we need to evaluate in the context of caching files in SSDs,
> >       memory etc
> >       -
> >
> >    Follow ups:
> >    -
> >
> >       Process to adopt new encodings (Sorry Micah will share doc tonight
> >       for community input)
> >       -
> >
> >          Selcuk, Jeff
> >          -
> >
> >          Talat
> >          -
> >
> >          Micah, Alkis
> >          -
> >
> >    [Thomas]DECFLOAT Parquet Proposal
> >    <
> >
> https://docs.google.com/document/d/1j_Q6vnn6Nhy60K4o0tdC91kE5vKGNJaoDOAm71KLzNw/edit?tab=t.0#heading=h.a3yn4bu050pz
> > >
> >    -
> >
> >       Decimal floating point type: A third type beyond fixed type and
> >       floating point
> >       -
> >
> >       Spark support Decfloat (using bigdecimal)
> >
> >
> > On Tue, May 27, 2025 at 8:35 PM Julien Le Dem <[email protected]> wrote:
> >
> > > The next Parquet sync is tomorrow May 28th at 10am PT - 1pm ET - 7pm
> CET
> > > To join the invite, join the group:
> > > https://groups.google.com/g/apache-parquet-community-sync
> > >
> > > Everybody is welcome, bring your topic or just listen in.
> > >
> > > (Some more details on how the meeting is run:
> > > https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
> > >
> >
>

Re: Parquet sync May 28th 2025

Reply via email to