Unfortunately, a -1 from my side (non-binding)

I've updated Iceberg to Parquet 1.11.0, and found three things:

   - We've broken backward compatibility of the constructor of
   ColumnChunkPageWriteStore
   
<https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80>.
   This required a change
   
<https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176>
   to the code. This isn't a hard blocker, but if there will be a new RC, I've
   submitted a patch: https://github.com/apache/parquet-mr/pull/699
   - Related, that we need to put in the changelog, is that checksums are
   enabled by default:
   
https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54
This
   will impact performance. I would suggest disabling it by default:
   https://github.com/apache/parquet-mr/pull/700
   
<https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277>
   - Binary compatibility. While updating Iceberg, I've noticed that the
   split-test was failing:
   
https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199
The
   two records are now divided over four Spark partitions. Something in the
   output has changed since the files are bigger now. Has anyone any idea to
   check what's changed, or a way to check this? The only thing I can think of
   is the checksum mentioned above.

$ ls -lah ~/Desktop/parquet-1-1*
-rw-r--r--  1 fokkodriesprong  staff   562B 17 nov 21:09
/Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
-rw-r--r--  1 fokkodriesprong  staff   611B 17 nov 21:05
/Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet

$ parquet-tools cat /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
id = 1
data = a

$ parquet-tools cat /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
id = 1
data = a

A binary diff here:
https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8

Cheers, Fokko

Op za 16 nov. 2019 om 04:18 schreef Junjie Chen <chenjunjied...@gmail.com>:

> +1
> Verified signature, checksum and ran mvn install successfully.
>
> Wang, Yuming <yumw...@ebay.com.invalid> 于2019年11月14日周四 下午2:05写道:
> >
> > +1
> > Tested Parquet 1.11.0 with Spark SQL module: build/sbt "sql/test-only"
> -Phadoop-3.2
> >
> > On 2019/11/13, 21:33, "Gabor Szadovszky" <ga...@apache.org> wrote:
> >
> >     Hi everyone,
> >
> >     I propose the following RC to be released as official Apache Parquet
> 1.11.0
> >     release.
> >
> >     The commit id is 18519eb8e059865652eee3ff0e8593f126701da4
> >     * This corresponds to the tag: apache-parquet-1.11.0-rc7
> >     *
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&amp;reserved=0
> >
> >     The release tarball, signature, and checksums are here:
> >     *
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&amp;reserved=0
> >
> >     You can find the KEYS file here:
> >     *
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&amp;reserved=0
> >
> >     Binary artifacts are staged in Nexus here:
> >     *
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&amp;reserved=0
> >
> >     This release includes the changes listed at:
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&amp;reserved=0
> >
> >     Please download, verify, and test.
> >
> >     Please vote in the next 72 hours.
> >
> >     [ ] +1 Release this as Apache Parquet 1.11.0
> >     [ ] +0
> >     [ ] -1 Do not release this because...
> >
> >
>

Reply via email to