Re: [VOTE] Release Apache Parquet 1.11.0 RC7

Michael Heuer Thu, 21 Nov 2019 16:44:53 -0800

The dependency versions are consistent in our artifact

$ mvn dependency:tree | grep parquet
[INFO] |  \- org.apache.parquet:parquet-avro:jar:1.11.0:compile
[INFO] |     \- org.apache.parquet:parquet-format-structures:jar:1.11.0:compile
[INFO] |  +- org.apache.parquet:parquet-column:jar:1.11.0:compile
[INFO] |  |  +- org.apache.parquet:parquet-common:jar:1.11.0:compile
[INFO] |  |  \- org.apache.parquet:parquet-encoding:jar:1.11.0:compile
[INFO] |  +- org.apache.parquet:parquet-hadoop:jar:1.11.0:compile
[INFO] |  |  +- org.apache.parquet:parquet-jackson:jar:1.11.0:compile


The latter error

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
0.0 (TID 0, localhost, executor driver): java.lang.NoSuchMethodError: 
org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161)

occurs when I attempt to run via spark-submit on Spark 2.4.4

$ spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_191
Branch
Compiled by user  on 2019-08-27T21:21:38Z
Revision
Url
Type --help for more information.



> On Nov 21, 2019, at 6:06 PM, Ryan Blue <[email protected]> wrote:
> 
> Thanks for looking into it, Nandor. That doesn't sound like a problem with
> Parquet, but a problem with the test environment since parquet-avro depends
> on a newer API method.
> 
> On Thu, Nov 21, 2019 at 3:58 PM Nandor Kollar <[email protected]>
> wrote:
> 
>> I'm not sure that this is a binary compatibility issue. The missing builder
>> method was recently added in 1.11.0 with the introduction of the new
>> logical type API, while the original version (one with a single
>> OriginalType input parameter called before by AvroSchemaConverter) of this
>> method is kept untouched. It seems to me that the Parquet version on Spark
>> executor mismatch: parquet-avro is on 1.11.0, but parquet-column is still
>> on an older version.
>> 
>> On Thu, Nov 21, 2019 at 11:41 PM Michael Heuer <[email protected]> wrote:
>> 
>>> Perhaps not strictly necessary to say, but if this particular
>>> compatibility break between 1.10 and 1.11 was intentional, and no other
>>> compatibility breaks are found, I would vote -1 (non-binding) on this RC
>>> such that we might go back and revisit the changes to preserve
>>> compatibility.
>>> 
>>> I am not sure there is presently enough motivation in the Spark project
>>> for a release after 2.4.4 and before 3.0 in which to bump the Parquet
>>> dependency version to 1.11.x.
>>> 
>>>   michael
>>> 
>>> 
>>>> On Nov 21, 2019, at 11:01 AM, Ryan Blue <[email protected]>
>>> wrote:
>>>> 
>>>> Gabor, shouldn't Parquet be binary compatible for public APIs? From the
>>>> stack trace, it looks like this 1.11.0 RC breaks binary compatibility
>> in
>>>> the type builders.
>>>> 
>>>> Looks like this should have been caught by the binary compatibility
>>> checks.
>>>> 
>>>> On Thu, Nov 21, 2019 at 8:56 AM Gabor Szadovszky <[email protected]>
>>> wrote:
>>>> 
>>>>> Hi Michael,
>>>>> 
>>>>> Unfortunately, I don't have too much experience on Spark. But if spark
>>> uses
>>>>> the parquet-mr library in an embedded way (that's how Hive uses it) it
>>> is
>>>>> required to re-build Spark with 1.11 RC parquet-mr.
>>>>> 
>>>>> Regards,
>>>>> Gabor
>>>>> 
>>>>> On Wed, Nov 20, 2019 at 5:44 PM Michael Heuer <[email protected]>
>>> wrote:
>>>>> 
>>>>>> It appears a provided scope dependency on spark-sql leaks old parquet
>>>>>> versions was causing the runtime error below.  After including new
>>>>>> parquet-column and parquet-hadoop compile scope dependencies (in
>>> addition
>>>>>> to parquet-avro, which we already have at compile scope), our build
>>>>>> succeeds.
>>>>>> 
>>>>>> https://github.com/bigdatagenomics/adam/pull/2232 <
>>>>>> https://github.com/bigdatagenomics/adam/pull/2232>
>>>>>> 
>>>>>> However, when running via spark-submit I run into a similar runtime
>>> error
>>>>>> 
>>>>>> Caused by: java.lang.NoSuchMethodError:
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:226)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:182)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:141)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:244)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:135)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:126)
>>>>>>       at
>>>>>> 
>>> org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
>>>>>>       at org.apache.spark.internal.io
>>>>>> 
>> .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350)
>>>>>>       at org.apache.spark.internal.io
>>>>>> 
>>>>> 
>>> 
>> .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120)
>>>>>>       at org.apache.spark.internal.io
>>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
>>>>>>       at org.apache.spark.internal.io
>>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
>>>>>>       at
>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>>>>>>       at org.apache.spark.scheduler.Task.run(Task.scala:123)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>>>>>>       at
>>>>>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>>>>>>       at
>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>       at
>>>>>> 
>>>>> 
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>       at java.lang.Thread.run(Thread.java:748)
>>>>>> 
>>>>>> 
>>>>>> Will bumping our library dependency version to 1.11 require a new
>>> version
>>>>>> of Spark, built against Parquet 1.11?
>>>>>> 
>>>>>> Please accept my apologies if this is heading out-of-scope for the
>>>>> Parquet
>>>>>> mailing list.
>>>>>> 
>>>>>>  michael
>>>>>> 
>>>>>> 
>>>>>>> On Nov 20, 2019, at 10:00 AM, Michael Heuer <[email protected]>
>>> wrote:
>>>>>>> 
>>>>>>> I am willing to do some benchmarking on genomic data at scale but am
>>>>> not
>>>>>> quite sure what the Spark target version for 1.11.0 might be.  Will
>>>>> Parquet
>>>>>> 1.11.0 be compatible in Spark 2.4.x?
>>>>>>> 
>>>>>>> Updating from 1.10.1 to 1.11.0 breaks at runtime in our build
>>>>>>> 
>>>>>>> …
>>>>>>> D 0, localhost, executor driver): java.lang.NoClassDefFoundError:
>>>>>> org/apache/parquet/schema/LogicalTypeAnnotation
>>>>>>>     at
>>>>>> 
>>> org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121)
>>>>>>>     at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388)
>>>>>>>     at
>>>>>> 
>>>>> 
>>> 
>> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
>>>>>>>     at
>>>>>> 
>>>>> 
>>> 
>> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
>>>>>>>     at org.apache.spark.internal.io
>>>>>> 
>> .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350)
>>>>>>>     at org.apache.spark.internal.io
>>>>>> 
>>>>> 
>>> 
>> .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120)
>>>>>>>     at org.apache.spark.internal.io
>>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
>>>>>>>     at org.apache.spark.internal.io
>>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
>>>>>>>     at
>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>>>>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:123)
>>>>>>>     at
>>>>>> 
>>>>> 
>>> 
>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>>>>>>>     at
>>>>>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>>>>>>>     at
>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>>>>>>>     at
>>>>>> 
>>>>> 
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>     at
>>>>>> 
>>>>> 
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>     at java.lang.Thread.run(Thread.java:748)
>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>> org.apache.parquet.schema.LogicalTypeAnnotation
>>>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>>>     at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>>> 
>>>>>>> michael
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 20, 2019, at 3:29 AM, Gabor Szadovszky <[email protected]>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> Thanks, Fokko.
>>>>>>>> 
>>>>>>>> Ryan, we did not do such measurements yet. I'm afraid, I won't have
>>>>>> enough
>>>>>>>> time to do that in the next couple of weeks.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Gabor
>>>>>>>> 
>>>>>>>> On Tue, Nov 19, 2019 at 6:14 PM Driesprong, Fokko
>>>>> <[email protected]
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks Gabor for the explanation. I'd like to change my vote to +1
>>>>>>>>> (non-binding).
>>>>>>>>> 
>>>>>>>>> Cheers, Fokko
>>>>>>>>> 
>>>>>>>>> Op di 19 nov. 2019 om 18:03 schreef Ryan Blue
>>>>>> <[email protected]>
>>>>>>>>> 
>>>>>>>>>> Gabor, what I meant was: have we tried this with real data to see
>>>>> the
>>>>>>>>>> effect? I think those results would be helpful.
>>>>>>>>>> 
>>>>>>>>>> On Mon, Nov 18, 2019 at 11:35 PM Gabor Szadovszky <
>>> [email protected]
>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Ryan,
>>>>>>>>>>> 
>>>>>>>>>>> It is not easy to calculate. For the column indexes feature we
>>>>>>>>> introduced
>>>>>>>>>>> two new structures saved before the footer: column indexes and
>>>>> offset
>>>>>>>>>>> indexes. If the min/max values are not too long, then the
>>>>> truncation
>>>>>>>>>> might
>>>>>>>>>>> not decrease the file size because of the offset indexes.
>>> Moreover,
>>>>>> we
>>>>>>>>>> also
>>>>>>>>>>> introduced parquet.page.row.count.limit which might increase the
>>>>>> number
>>>>>>>>>> of
>>>>>>>>>>> pages which leads to increasing the file size.
>>>>>>>>>>> The footer itself is also changed and we are saving more values
>> in
>>>>>> it:
>>>>>>>>>> the
>>>>>>>>>>> offset values to the column/offset indexes, the new logical type
>>>>>>>>>>> structures, the CRC checksums (we might have some others).
>>>>>>>>>>> So, the size of the files with small amount of data will be
>>>>> increased
>>>>>>>>>>> (because of the larger footer). The size of the files where the
>>>>>> values
>>>>>>>>>> can
>>>>>>>>>>> be encoded very well (RLE) will probably be increased (because
>> we
>>>>>> will
>>>>>>>>>> have
>>>>>>>>>>> more pages). The size of some files where the values are long
>>>>>> (>64bytes
>>>>>>>>>> by
>>>>>>>>>>> default) might be decreased because of truncating the min/max
>>>>> values.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Gabor
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue
>>>>> <[email protected]
>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Gabor, do we have an idea of the additional overhead for a
>>>>> non-test
>>>>>>>>>> data
>>>>>>>>>>>> file? It should be easy to validate that this doesn't introduce
>>> an
>>>>>>>>>>>> unreasonable amount of overhead. In some cases, it should
>>> actually
>>>>>> be
>>>>>>>>>>>> smaller since the column indexes are truncated and page stats
>> are
>>>>>>>>> not.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Nov 18, 2019 at 1:00 AM Gabor Szadovszky
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Fokko,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For the first point. The referenced constructor is private and
>>>>>>>>>> Iceberg
>>>>>>>>>>>> uses
>>>>>>>>>>>>> it via reflection. It is not a breaking change. I think,
>>>>> parquet-mr
>>>>>>>>>>> shall
>>>>>>>>>>>>> not keep private methods only because of clients might use
>> them
>>>>> via
>>>>>>>>>>>>> reflection.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> About the checksum. I've agreed on having the CRC checksum
>> write
>>>>>>>>>>> enabled
>>>>>>>>>>>> by
>>>>>>>>>>>>> default because the benchmarks did not show significant
>>>>> performance
>>>>>>>>>>>>> penalties. See https://github.com/apache/parquet-mr/pull/647
>>> for
>>>>>>>>>>>> details.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> About the file size change. 1.11.0 is introducing column
>>> indexes,
>>>>>>>>> CRC
>>>>>>>>>>>>> checksum, removing the statistics from the page headers and
>>> maybe
>>>>>>>>>> other
>>>>>>>>>>>>> changes that impact file size. If only file size is in
>> question
>>> I
>>>>>>>>>>> cannot
>>>>>>>>>>>>> see a breaking change here.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Gabor
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sun, Nov 17, 2019 at 9:27 PM Driesprong, Fokko
>>>>>>>>>> <[email protected]
>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Unfortunately, a -1 from my side (non-binding)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I've updated Iceberg to Parquet 1.11.0, and found three
>> things:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - We've broken backward compatibility of the constructor of
>>>>>>>>>>>>>> ColumnChunkPageWriteStore
>>>>>>>>>>>>>> <
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> This required a change
>>>>>>>>>>>>>> <
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> to the code. This isn't a hard blocker, but if there will be
>> a
>>>>>>>>>> new
>>>>>>>>>>>> RC,
>>>>>>>>>>>>>> I've
>>>>>>>>>>>>>> submitted a patch:
>>>>>>>>>> https://github.com/apache/parquet-mr/pull/699
>>>>>>>>>>>>>> - Related, that we need to put in the changelog, is that
>>>>>>>>>> checksums
>>>>>>>>>>>> are
>>>>>>>>>>>>>> enabled by default:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54
>>>>>>>>>>>>>> This
>>>>>>>>>>>>>> will impact performance. I would suggest disabling it by
>>>>>>>>>> default:
>>>>>>>>>>>>>> https://github.com/apache/parquet-mr/pull/700
>>>>>>>>>>>>>> <
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Binary compatibility. While updating Iceberg, I've noticed
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>> split-test was failing:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199
>>>>>>>>>>>>>> The
>>>>>>>>>>>>>> two records are now divided over four Spark partitions.
>>>>>>>>>> Something
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> output has changed since the files are bigger now. Has anyone
>>>>>>>>>> any
>>>>>>>>>>>> idea
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> check what's changed, or a way to check this? The only thing
>> I
>>>>>>>>>> can
>>>>>>>>>>>>>> think of
>>>>>>>>>>>>>> is the checksum mentioned above.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> $ ls -lah ~/Desktop/parquet-1-1*
>>>>>>>>>>>>>> -rw-r--r--  1 fokkodriesprong  staff   562B 17 nov 21:09
>>>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
>>>>>>>>>>>>>> -rw-r--r--  1 fokkodriesprong  staff   611B 17 nov 21:05
>>>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> $ parquet-tools cat
>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
>>>>>>>>>>>>>> id = 1
>>>>>>>>>>>>>> data = a
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> $ parquet-tools cat
>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
>>>>>>>>>>>>>> id = 1
>>>>>>>>>>>>>> data = a
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> A binary diff here:
>>>>>>>>>>>>>> 
>> https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers, Fokko
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Op za 16 nov. 2019 om 04:18 schreef Junjie Chen <
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> Verified signature, checksum and ran mvn install
>> successfully.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Wang, Yuming <[email protected]> 于2019年11月14日周四
>>>>>>>>> 下午2:05写道：
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>> Tested Parquet 1.11.0 with Spark SQL module: build/sbt
>>>>>>>>>>>>> "sql/test-only"
>>>>>>>>>>>>>>> -Phadoop-3.2
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 2019/11/13, 21:33, "Gabor Szadovszky" <
>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I propose the following RC to be released as official
>>>>>>>>>> Apache
>>>>>>>>>>>>>> Parquet
>>>>>>>>>>>>>>> 1.11.0
>>>>>>>>>>>>>>>> release.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The commit id is 18519eb8e059865652eee3ff0e8593f126701da4
>>>>>>>>>>>>>>>> * This corresponds to the tag: apache-parquet-1.11.0-rc7
>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&amp;reserved=0
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The release tarball, signature, and checksums are here:
>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&amp;reserved=0
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> You can find the KEYS file here:
>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&amp;reserved=0
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Binary artifacts are staged in Nexus here:
>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&amp;reserved=0
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This release includes the changes listed at:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&amp;reserved=0
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Please download, verify, and test.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Please vote in the next 72 hours.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [ ] +1 Release this as Apache Parquet 1.11.0
>>>>>>>>>>>>>>>> [ ] +0
>>>>>>>>>>>>>>>> [ ] -1 Do not release this because...
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>> Netflix
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Ryan Blue
>>>>>>>>>> Software Engineer
>>>>>>>>>> Netflix
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>> 
>>> 
>> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

Reply via email to