Re: Parquet-MR 2.0?

Gang Wu Sun, 24 Sep 2023 19:14:04 -0700

Hi David,

There is already a mailing list discussion [1] and a JIRA issue [2]. Please
take a look and let me know what you think. There is also an open PR [3]
which may interest you.


[1] https://lists.apache.org/thread/d33757j99xqn63hrfz415sq60v3x9hmy
[2] https://issues.apache.org/jira/browse/PARQUET-1822
[3] https://github.com/apache/parquet-mr/pull/1141

Best,
Gang

On Mon, Sep 25, 2023 at 9:49 AM David <[email protected]> wrote:

> Hello Folks,
>
> Probably a repeat, so my apologies in advance.
>
> Is there any appetite for a Parquet 2.0?
>
> In my mind, the greatest need is to cut the dependency on Hadoop and allow
> simply for the Parquet file format to exists on its own.
>
> I was recently considering a project by which a light-weight stand-alone
> application can exist that reads Iceberg Tables (Parquet) data.  My use
> case includes a lot of readers on slow-moving data.  Essentially a mini
> HBase-like client that can read data either from S3 or a local file system.
>
> Anyway, I started putting together a quick PoC and forgot that I needed to
> carry with me so very many Hadoop JARs (and their dependencies).  I also
> hit a snack trying to test on a Windows work laptop because the hadoop file
> IO librarians require some sort of specialized binary support shims.
>
> So, the main goal of version 2 would be to develop Parquet library as a
> stand-alone pure Java framework and the other packages (e.g., hadoop,
> protobuf, etc.) would be offered as additional extensions.
>
> So the package structure would be something like:
>
> - parquet-api (InputSource, ParquetReader, ParquetWriter, etc)
> - parquet-core (the actual parquet framework)
> - parquet-hadoop (e g., Simple InputSource Implementation, Splitters, etc.)
>
> Thanks.
>

Re: Parquet-MR 2.0?

Reply via email to