Forgot to mention also moving master to 1.12.0-SNAPSHOT to validate that
all things transition ok too and that someone does not accidentally merge a
PR that does not end up in the correct branch.

On Tue, Nov 19, 2019 at 10:35 AM Ismaël Mejía <ieme...@gmail.com> wrote:

> +1
>
> Downloaded release code, checked hashes/signatures, run full tests and
> installed locally with zero errors. Tested integration on a downstream
> project (Apache Beam) and no issues (Note that we don't use any of the new
> features yet).
>
> Gabor, can you please create a corresponding parquet-1.11.x branch. I
> expected to compare the release with the branch and tag but I found the
> branch is not present.
>
> Thanks,
> Ismaël
>
>
>
> On Tue, Nov 19, 2019 at 8:35 AM Gabor Szadovszky <ga...@apache.org> wrote:
>
>> Hi Ryan,
>>
>> It is not easy to calculate. For the column indexes feature we introduced
>> two new structures saved before the footer: column indexes and offset
>> indexes. If the min/max values are not too long, then the truncation might
>> not decrease the file size because of the offset indexes. Moreover, we
>> also
>> introduced parquet.page.row.count.limit which might increase the number of
>> pages which leads to increasing the file size.
>> The footer itself is also changed and we are saving more values in it: the
>> offset values to the column/offset indexes, the new logical type
>> structures, the CRC checksums (we might have some others).
>> So, the size of the files with small amount of data will be increased
>> (because of the larger footer). The size of the files where the values can
>> be encoded very well (RLE) will probably be increased (because we will
>> have
>> more pages). The size of some files where the values are long (>64bytes by
>> default) might be decreased because of truncating the min/max values.
>>
>> Regards,
>> Gabor
>>
>> On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>> > Gabor, do we have an idea of the additional overhead for a non-test data
>> > file? It should be easy to validate that this doesn't introduce an
>> > unreasonable amount of overhead. In some cases, it should actually be
>> > smaller since the column indexes are truncated and page stats are not.
>> >
>> > On Mon, Nov 18, 2019 at 1:00 AM Gabor Szadovszky
>> > <gabor.szadovs...@cloudera.com.invalid> wrote:
>> >
>> > > Hi Fokko,
>> > >
>> > > For the first point. The referenced constructor is private and Iceberg
>> > uses
>> > > it via reflection. It is not a breaking change. I think, parquet-mr
>> shall
>> > > not keep private methods only because of clients might use them via
>> > > reflection.
>> > >
>> > > About the checksum. I've agreed on having the CRC checksum write
>> enabled
>> > by
>> > > default because the benchmarks did not show significant performance
>> > > penalties. See https://github.com/apache/parquet-mr/pull/647 for
>> > details.
>> > >
>> > > About the file size change. 1.11.0 is introducing column indexes, CRC
>> > > checksum, removing the statistics from the page headers and maybe
>> other
>> > > changes that impact file size. If only file size is in question I
>> cannot
>> > > see a breaking change here.
>> > >
>> > > Regards,
>> > > Gabor
>> > >
>> > >
>> > >
>> > > On Sun, Nov 17, 2019 at 9:27 PM Driesprong, Fokko
>> <fo...@driesprong.frl>
>> > > wrote:
>> > >
>> > > > Unfortunately, a -1 from my side (non-binding)
>> > > >
>> > > > I've updated Iceberg to Parquet 1.11.0, and found three things:
>> > > >
>> > > >    - We've broken backward compatibility of the constructor of
>> > > >    ColumnChunkPageWriteStore
>> > > >    <
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80
>> > > > >.
>> > > >    This required a change
>> > > >    <
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176
>> > > > >
>> > > >    to the code. This isn't a hard blocker, but if there will be a
>> new
>> > RC,
>> > > > I've
>> > > >    submitted a patch: https://github.com/apache/parquet-mr/pull/699
>> > > >    - Related, that we need to put in the changelog, is that
>> checksums
>> > are
>> > > >    enabled by default:
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54
>> > > > This
>> > > >    will impact performance. I would suggest disabling it by default:
>> > > >    https://github.com/apache/parquet-mr/pull/700
>> > > >    <
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277
>> > > > >
>> > > >    - Binary compatibility. While updating Iceberg, I've noticed that
>> > the
>> > > >    split-test was failing:
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199
>> > > > The
>> > > >    two records are now divided over four Spark partitions.
>> Something in
>> > > the
>> > > >    output has changed since the files are bigger now. Has anyone any
>> > idea
>> > > > to
>> > > >    check what's changed, or a way to check this? The only thing I
>> can
>> > > > think of
>> > > >    is the checksum mentioned above.
>> > > >
>> > > > $ ls -lah ~/Desktop/parquet-1-1*
>> > > > -rw-r--r--  1 fokkodriesprong  staff   562B 17 nov 21:09
>> > > > /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
>> > > > -rw-r--r--  1 fokkodriesprong  staff   611B 17 nov 21:05
>> > > > /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
>> > > >
>> > > > $ parquet-tools cat
>> > /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
>> > > > id = 1
>> > > > data = a
>> > > >
>> > > > $ parquet-tools cat
>> > /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
>> > > > id = 1
>> > > > data = a
>> > > >
>> > > > A binary diff here:
>> > > > https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8
>> > > >
>> > > > Cheers, Fokko
>> > > >
>> > > > Op za 16 nov. 2019 om 04:18 schreef Junjie Chen <
>> > > chenjunjied...@gmail.com
>> > > > >:
>> > > >
>> > > > > +1
>> > > > > Verified signature, checksum and ran mvn install successfully.
>> > > > >
>> > > > > Wang, Yuming <yumw...@ebay.com.invalid> 于2019年11月14日周四 下午2:05写道:
>> > > > > >
>> > > > > > +1
>> > > > > > Tested Parquet 1.11.0 with Spark SQL module: build/sbt
>> > > "sql/test-only"
>> > > > > -Phadoop-3.2
>> > > > > >
>> > > > > > On 2019/11/13, 21:33, "Gabor Szadovszky" <ga...@apache.org>
>> > wrote:
>> > > > > >
>> > > > > >     Hi everyone,
>> > > > > >
>> > > > > >     I propose the following RC to be released as official Apache
>> > > > Parquet
>> > > > > 1.11.0
>> > > > > >     release.
>> > > > > >
>> > > > > >     The commit id is 18519eb8e059865652eee3ff0e8593f126701da4
>> > > > > >     * This corresponds to the tag: apache-parquet-1.11.0-rc7
>> > > > > >     *
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&amp;reserved=0
>> > > > > >
>> > > > > >     The release tarball, signature, and checksums are here:
>> > > > > >     *
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&amp;reserved=0
>> > > > > >
>> > > > > >     You can find the KEYS file here:
>> > > > > >     *
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&amp;reserved=0
>> > > > > >
>> > > > > >     Binary artifacts are staged in Nexus here:
>> > > > > >     *
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&amp;reserved=0
>> > > > > >
>> > > > > >     This release includes the changes listed at:
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&amp;reserved=0
>> > > > > >
>> > > > > >     Please download, verify, and test.
>> > > > > >
>> > > > > >     Please vote in the next 72 hours.
>> > > > > >
>> > > > > >     [ ] +1 Release this as Apache Parquet 1.11.0
>> > > > > >     [ ] +0
>> > > > > >     [ ] -1 Do not release this because...
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> > Ryan Blue
>> > Software Engineer
>> > Netflix
>> >
>>
>

Reply via email to