I believe DuckDB has their own custom parquet implementation[1].

[1]:
https://github.com/duckdb/duckdb/blob/26cb7178fd89f924a936874e5c09ec1f6df8a0a4/extension/parquet/parquet_extension.cpp#L88

On Tue, Jan 14, 2025 at 3:11 PM Steve Loughran <ste...@cloudera.com.invalid>
wrote:

> Is this the library used by DuckDB? As I've heard that it doesn't add
> statistics to parquet files, which is unfortunate
>
> On Tue, 14 Jan 2025 at 15:13, Andrew Lamb <andrewlam...@gmail.com> wrote:
>
> > I believe Ed added these statistics into parquet-rs[1] as well. We have
> > also enabled them by default and haven't seen any performance issues.
> >
> > Andrew
> >
> > [1] https://github.com/apache/arrow-rs/pull/6105
> >
> > On Tue, Jan 14, 2025 at 9:38 AM Gang Wu <ust...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > The C++ Parquet implementation in the Apache Arrow (namely the
> > parquet-cpp)
> > > has
> > > added Page Index support since 13.0.0. Recently SizeStatistics support
> is
> > > also
> > > added in 19.0.0. Both features are disabled by default. We did a
> > benchmark
> > > and
> > > the result showed that we can enable them by default with acceptable
> > > penalties.
> > > Therefore I opened a PR [1] to turn on them by default. The benchmark
> > > result
> > > is also available in this PR. Any feedback is welcome. If there is no
> > > objection,
> > > we will merge this PR and release it with Apache Arrow 20.0.0.
> > >
> > > [1] https://github.com/apache/arrow/pull/45249
> > >
> > > Best,
> > > Gang
> > >
> >
>

Reply via email to