Re: Pros/cons of setting parquet.writer.version=v2

Ryan Blue Wed, 15 Nov 2017 08:27:46 -0800

I agree with what Zoltan said, and I would add that we may use different
encodings in the future. We haven't officially closed v2, so we could add
different encodings to the spec and not require support for the existing
ones. Parquet Java would still be able to read data in those encodings, but
there's no guarantee that other readers would add support for them.


I also ran a few tests with some of our company data and didn't find a huge
benefit to the existing v2 encodings. That's why I build and proposed
different ones. If I were you, I'd stick with v1.

rb

On Wed, Nov 15, 2017 at 12:11 AM, Zoltan Ivanfi <[email protected]> wrote:

> Hi,
>
> In my opinion, compatibility is the main thing to consider here. Some
> applications (Impala being a notable example) only support v1 at the
> moment. You should carefully consider what applications you might want to
> use in the future to process the data and check whether they all support
> v2.
>
> Regards,
>
> Zoltan
>
> On Wed, Nov 15, 2017 at 3:07 AM Ivan Gozali <[email protected]> wrote:
>
> > Hi Parquet maintainers,
> >
> > I was wondering if there are any advantages (e.g. performance increases)
> or
> > disadvantages (e.g. any stability issues) for setting the configuration
> > parquet.writer.version=v2 in apache-parquet-1.8.2 (particularly curious
> > about this version since Spark 2.2.0 uses it) or above?
> >
> > Thank you in advance!
> >
> > --
> > Regards,
> >
> >
> > Ivan Gozali
> > Lecida
> > Email: [email protected]
> >
>



-- 
Ryan Blue
Software Engineer
Netflix

Re: Pros/cons of setting parquet.writer.version=v2

Reply via email to