Notes from the meeting:

Notes

   -

   Variant
   -

      Finalize the variant spec
      -

         Aihua: spend time on validation and finalize the spec.
         -

         Java <-> go for shredding
         -

         Rust: shredding is in implementation?
         -

         Commit example to the parquet testing.
         -

            https://github.com/apache/parquet-testing/issues/75
            -

   Parquet testing fo haskell <https://github.com/mchav/dataframe>
   -

      Pure haskell implementation
      -

      Use apache/parquet-testing for testing
      -

   Time interval
   -

      Yun: agreement on y-m interval
      -

      Duration: nano? Parameter for time unit.
      -

      Will follow up on list and java implementation.
      -

   Encodings
   -

      Jeff/Prateek:
      -

         Doc for process in progress: Parquet new features
         
<https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0>
         -

         Starting a few proposals:
         -

            FSST: strings (see paper FSST: Fast Random Access String
            Compression <https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf> )
            -

            ALP: floating points (see paper ALP: Adaptive Lossless
            floating-Point Compression
            <https://ir.cwi.nl/pub/33334/33334.pdf>)
            -

      This paper also has a bunch of good example datasets to test for
      string compression:
      https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf
      -

   Footer
   -

      Micah: to follow up with Alkis
      -

      Rok interested



Action items

   - [image: unchecked]

   Aihua, Michael, Martin, David: to collaborate on to test files, cross
   compatibility tests for finalizing Variant. Can use
   https://github.com/apache/parquet-testing/issues/75 for coordinating
   - [image: unchecked]

   Yun: follow up on the mailing list on time intervals.
   - [image: unchecked]

   Jeff: start email thread on ALP (or a new encoding).


On Wed, Jul 23, 2025 at 9:28 AM Julien Le Dem <jul...@apache.org> wrote:

> The next Parquet sync is today July 23rd at 10am PT - 1pm ET - 7pm CET
> (in 30 mins)
> I'll be there!
>
> To join the invite, join the group:
> https://groups.google.com/g/apache-parquet-community-sync
>
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>

Reply via email to