I agree with what Zoltan said, and I would add that we may use different encodings in the future. We haven't officially closed v2, so we could add different encodings to the spec and not require support for the existing ones. Parquet Java would still be able to read data in those encodings, but there's no guarantee that other readers would add support for them.
I also ran a few tests with some of our company data and didn't find a huge benefit to the existing v2 encodings. That's why I build and proposed different ones. If I were you, I'd stick with v1. rb On Wed, Nov 15, 2017 at 12:11 AM, Zoltan Ivanfi <[email protected]> wrote: > Hi, > > In my opinion, compatibility is the main thing to consider here. Some > applications (Impala being a notable example) only support v1 at the > moment. You should carefully consider what applications you might want to > use in the future to process the data and check whether they all support > v2. > > Regards, > > Zoltan > > On Wed, Nov 15, 2017 at 3:07 AM Ivan Gozali <[email protected]> wrote: > > > Hi Parquet maintainers, > > > > I was wondering if there are any advantages (e.g. performance increases) > or > > disadvantages (e.g. any stability issues) for setting the configuration > > parquet.writer.version=v2 in apache-parquet-1.8.2 (particularly curious > > about this version since Spark 2.2.0 uses it) or above? > > > > Thank you in advance! > > > > -- > > Regards, > > > > > > Ivan Gozali > > Lecida > > Email: [email protected] > > > -- Ryan Blue Software Engineer Netflix
