Hi Fokko,

For the first point. The referenced constructor is private and Iceberg uses
it via reflection. It is not a breaking change. I think, parquet-mr shall
not keep private methods only because of clients might use them via
reflection.

About the checksum. I've agreed on having the CRC checksum write enabled by
default because the benchmarks did not show significant performance
penalties. See https://github.com/apache/parquet-mr/pull/647 for details.

About the file size change. 1.11.0 is introducing column indexes, CRC
checksum, removing the statistics from the page headers and maybe other
changes that impact file size. If only file size is in question I cannot
see a breaking change here.

Regards,
Gabor



On Sun, Nov 17, 2019 at 9:27 PM Driesprong, Fokko <[email protected]>
wrote:

> Unfortunately, a -1 from my side (non-binding)
>
> I've updated Iceberg to Parquet 1.11.0, and found three things:
>
>    - We've broken backward compatibility of the constructor of
>    ColumnChunkPageWriteStore
>    <
> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80
> >.
>    This required a change
>    <
> https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176
> >
>    to the code. This isn't a hard blocker, but if there will be a new RC,
> I've
>    submitted a patch: https://github.com/apache/parquet-mr/pull/699
>    - Related, that we need to put in the changelog, is that checksums are
>    enabled by default:
>
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54
> This
>    will impact performance. I would suggest disabling it by default:
>    https://github.com/apache/parquet-mr/pull/700
>    <
> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277
> >
>    - Binary compatibility. While updating Iceberg, I've noticed that the
>    split-test was failing:
>
> https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199
> The
>    two records are now divided over four Spark partitions. Something in the
>    output has changed since the files are bigger now. Has anyone any idea
> to
>    check what's changed, or a way to check this? The only thing I can
> think of
>    is the checksum mentioned above.
>
> $ ls -lah ~/Desktop/parquet-1-1*
> -rw-r--r--  1 fokkodriesprong  staff   562B 17 nov 21:09
> /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
> -rw-r--r--  1 fokkodriesprong  staff   611B 17 nov 21:05
> /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
>
> $ parquet-tools cat /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
> id = 1
> data = a
>
> $ parquet-tools cat /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
> id = 1
> data = a
>
> A binary diff here:
> https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8
>
> Cheers, Fokko
>
> Op za 16 nov. 2019 om 04:18 schreef Junjie Chen <[email protected]
> >:
>
> > +1
> > Verified signature, checksum and ran mvn install successfully.
> >
> > Wang, Yuming <[email protected]> 于2019年11月14日周四 下午2:05写道:
> > >
> > > +1
> > > Tested Parquet 1.11.0 with Spark SQL module: build/sbt "sql/test-only"
> > -Phadoop-3.2
> > >
> > > On 2019/11/13, 21:33, "Gabor Szadovszky" <[email protected]> wrote:
> > >
> > >     Hi everyone,
> > >
> > >     I propose the following RC to be released as official Apache
> Parquet
> > 1.11.0
> > >     release.
> > >
> > >     The commit id is 18519eb8e059865652eee3ff0e8593f126701da4
> > >     * This corresponds to the tag: apache-parquet-1.11.0-rc7
> > >     *
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&amp;reserved=0
> > >
> > >     The release tarball, signature, and checksums are here:
> > >     *
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&amp;reserved=0
> > >
> > >     You can find the KEYS file here:
> > >     *
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&amp;reserved=0
> > >
> > >     Binary artifacts are staged in Nexus here:
> > >     *
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&amp;reserved=0
> > >
> > >     This release includes the changes listed at:
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&amp;reserved=0
> > >
> > >     Please download, verify, and test.
> > >
> > >     Please vote in the next 72 hours.
> > >
> > >     [ ] +1 Release this as Apache Parquet 1.11.0
> > >     [ ] +0
> > >     [ ] -1 Do not release this because...
> > >
> > >
> >
>

Reply via email to