Not changing the version to 1.12 was also intentional. Until we have a
successful vote for 1.11.0 it is not released and therefore we are still
working on 1.11. I'll upgrade the version to 1.12.0-SNAPSHOT after 1.11.0
is released.


On Tue, Nov 19, 2019 at 11:04 AM Gabor Szadovszky <ga...@apache.org> wrote:

> Hi Ismaël,
>
> Thanks for checking the release.
>
> Branch was not created because usually we release the current master for
> major/minor releases. Everything was on master is part of the current RC.
> That was what we agreed on at the last parquet sync. We usually create
> branches for patch releases because we are adding only some controlled
> fixes.
> What would you like to compare? The current RC and previous one have they
> own tags already. For the previous RCs you need the check the previous
> voting mails.
>
> Regards,
> Gabor
>
> On Tue, Nov 19, 2019 at 10:35 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>
>> +1
>>
>> Downloaded release code, checked hashes/signatures, run full tests and
>> installed locally with zero errors. Tested integration on a downstream
>> project (Apache Beam) and no issues (Note that we don't use any of the new
>> features yet).
>>
>> Gabor, can you please create a corresponding parquet-1.11.x branch. I
>> expected to compare the release with the branch and tag but I found the
>> branch is not present.
>>
>> Thanks,
>> Ismaël
>>
>>
>>
>> On Tue, Nov 19, 2019 at 8:35 AM Gabor Szadovszky <ga...@apache.org>
>> wrote:
>>
>> > Hi Ryan,
>> >
>> > It is not easy to calculate. For the column indexes feature we
>> introduced
>> > two new structures saved before the footer: column indexes and offset
>> > indexes. If the min/max values are not too long, then the truncation
>> might
>> > not decrease the file size because of the offset indexes. Moreover, we
>> also
>> > introduced parquet.page.row.count.limit which might increase the number
>> of
>> > pages which leads to increasing the file size.
>> > The footer itself is also changed and we are saving more values in it:
>> the
>> > offset values to the column/offset indexes, the new logical type
>> > structures, the CRC checksums (we might have some others).
>> > So, the size of the files with small amount of data will be increased
>> > (because of the larger footer). The size of the files where the values
>> can
>> > be encoded very well (RLE) will probably be increased (because we will
>> have
>> > more pages). The size of some files where the values are long (>64bytes
>> by
>> > default) might be decreased because of truncating the min/max values.
>> >
>> > Regards,
>> > Gabor
>> >
>> > On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue <rb...@netflix.com.invalid>
>> > wrote:
>> >
>> > > Gabor, do we have an idea of the additional overhead for a non-test
>> data
>> > > file? It should be easy to validate that this doesn't introduce an
>> > > unreasonable amount of overhead. In some cases, it should actually be
>> > > smaller since the column indexes are truncated and page stats are not.
>> > >
>> > > On Mon, Nov 18, 2019 at 1:00 AM Gabor Szadovszky
>> > > <gabor.szadovs...@cloudera.com.invalid> wrote:
>> > >
>> > > > Hi Fokko,
>> > > >
>> > > > For the first point. The referenced constructor is private and
>> Iceberg
>> > > uses
>> > > > it via reflection. It is not a breaking change. I think, parquet-mr
>> > shall
>> > > > not keep private methods only because of clients might use them via
>> > > > reflection.
>> > > >
>> > > > About the checksum. I've agreed on having the CRC checksum write
>> > enabled
>> > > by
>> > > > default because the benchmarks did not show significant performance
>> > > > penalties. See https://github.com/apache/parquet-mr/pull/647 for
>> > > details.
>> > > >
>> > > > About the file size change. 1.11.0 is introducing column indexes,
>> CRC
>> > > > checksum, removing the statistics from the page headers and maybe
>> other
>> > > > changes that impact file size. If only file size is in question I
>> > cannot
>> > > > see a breaking change here.
>> > > >
>> > > > Regards,
>> > > > Gabor
>> > > >
>> > > >
>> > > >
>> > > > On Sun, Nov 17, 2019 at 9:27 PM Driesprong, Fokko
>> <fo...@driesprong.frl
>> > >
>> > > > wrote:
>> > > >
>> > > > > Unfortunately, a -1 from my side (non-binding)
>> > > > >
>> > > > > I've updated Iceberg to Parquet 1.11.0, and found three things:
>> > > > >
>> > > > >    - We've broken backward compatibility of the constructor of
>> > > > >    ColumnChunkPageWriteStore
>> > > > >    <
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80
>> > > > > >.
>> > > > >    This required a change
>> > > > >    <
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176
>> > > > > >
>> > > > >    to the code. This isn't a hard blocker, but if there will be a
>> new
>> > > RC,
>> > > > > I've
>> > > > >    submitted a patch:
>> https://github.com/apache/parquet-mr/pull/699
>> > > > >    - Related, that we need to put in the changelog, is that
>> checksums
>> > > are
>> > > > >    enabled by default:
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54
>> > > > > This
>> > > > >    will impact performance. I would suggest disabling it by
>> default:
>> > > > >    https://github.com/apache/parquet-mr/pull/700
>> > > > >    <
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277
>> > > > > >
>> > > > >    - Binary compatibility. While updating Iceberg, I've noticed
>> that
>> > > the
>> > > > >    split-test was failing:
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199
>> > > > > The
>> > > > >    two records are now divided over four Spark partitions.
>> Something
>> > in
>> > > > the
>> > > > >    output has changed since the files are bigger now. Has anyone
>> any
>> > > idea
>> > > > > to
>> > > > >    check what's changed, or a way to check this? The only thing I
>> can
>> > > > > think of
>> > > > >    is the checksum mentioned above.
>> > > > >
>> > > > > $ ls -lah ~/Desktop/parquet-1-1*
>> > > > > -rw-r--r--  1 fokkodriesprong  staff   562B 17 nov 21:09
>> > > > > /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
>> > > > > -rw-r--r--  1 fokkodriesprong  staff   611B 17 nov 21:05
>> > > > > /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
>> > > > >
>> > > > > $ parquet-tools cat
>> > > /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
>> > > > > id = 1
>> > > > > data = a
>> > > > >
>> > > > > $ parquet-tools cat
>> > > /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
>> > > > > id = 1
>> > > > > data = a
>> > > > >
>> > > > > A binary diff here:
>> > > > > https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8
>> > > > >
>> > > > > Cheers, Fokko
>> > > > >
>> > > > > Op za 16 nov. 2019 om 04:18 schreef Junjie Chen <
>> > > > chenjunjied...@gmail.com
>> > > > > >:
>> > > > >
>> > > > > > +1
>> > > > > > Verified signature, checksum and ran mvn install successfully.
>> > > > > >
>> > > > > > Wang, Yuming <yumw...@ebay.com.invalid> 于2019年11月14日周四
>> 下午2:05写道:
>> > > > > > >
>> > > > > > > +1
>> > > > > > > Tested Parquet 1.11.0 with Spark SQL module: build/sbt
>> > > > "sql/test-only"
>> > > > > > -Phadoop-3.2
>> > > > > > >
>> > > > > > > On 2019/11/13, 21:33, "Gabor Szadovszky" <ga...@apache.org>
>> > > wrote:
>> > > > > > >
>> > > > > > >     Hi everyone,
>> > > > > > >
>> > > > > > >     I propose the following RC to be released as official
>> Apache
>> > > > > Parquet
>> > > > > > 1.11.0
>> > > > > > >     release.
>> > > > > > >
>> > > > > > >     The commit id is 18519eb8e059865652eee3ff0e8593f126701da4
>> > > > > > >     * This corresponds to the tag: apache-parquet-1.11.0-rc7
>> > > > > > >     *
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&amp;reserved=0
>> > > > > > >
>> > > > > > >     The release tarball, signature, and checksums are here:
>> > > > > > >     *
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&amp;reserved=0
>> > > > > > >
>> > > > > > >     You can find the KEYS file here:
>> > > > > > >     *
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&amp;reserved=0
>> > > > > > >
>> > > > > > >     Binary artifacts are staged in Nexus here:
>> > > > > > >     *
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&amp;reserved=0
>> > > > > > >
>> > > > > > >     This release includes the changes listed at:
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&amp;reserved=0
>> > > > > > >
>> > > > > > >     Please download, verify, and test.
>> > > > > > >
>> > > > > > >     Please vote in the next 72 hours.
>> > > > > > >
>> > > > > > >     [ ] +1 Release this as Apache Parquet 1.11.0
>> > > > > > >     [ ] +0
>> > > > > > >     [ ] -1 Do not release this because...
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Ryan Blue
>> > > Software Engineer
>> > > Netflix
>> > >
>> >
>>
>

Reply via email to