Re: [VOTE] Release Apache Parquet 1.11.0 RC7

Gabor Szadovszky Fri, 22 Nov 2019 00:24:02 -0800

Ryan,
I would not trust our compatibility checks (semver) too much. Currently, it
is configured to compare our current version to 1.7.0. It means anything
that is added since 1.7.0 and then broke in a later release won't be
caught. In addition, many packages are excluded from the check because of
different reasons. For example org/apache/parquet/schema/** is excluded so
if it would really be an API compatibility issue we certainly wouldn't
catch it.


Michael,
It fails because of a NoSuchMethodError pointing to a method that is newly
introduced in 1.11. Both the caller and the callee shipped by parquet-mr.
So, I'm quite sure it is a classpath issue. It seems that the 1.11 version
of the parquet-column jar is not on the classpath.


On Fri, Nov 22, 2019 at 1:44 AM Michael Heuer <[email protected]> wrote:

> The dependency versions are consistent in our artifact
>
> $ mvn dependency:tree | grep parquet
> [INFO] |  \- org.apache.parquet:parquet-avro:jar:1.11.0:compile
> [INFO] |     \-
> org.apache.parquet:parquet-format-structures:jar:1.11.0:compile
> [INFO] |  +- org.apache.parquet:parquet-column:jar:1.11.0:compile
> [INFO] |  |  +- org.apache.parquet:parquet-common:jar:1.11.0:compile
> [INFO] |  |  \- org.apache.parquet:parquet-encoding:jar:1.11.0:compile
> [INFO] |  +- org.apache.parquet:parquet-hadoop:jar:1.11.0:compile
> [INFO] |  |  +- org.apache.parquet:parquet-jackson:jar:1.11.0:compile
>
> The latter error
>
> Caused by: org.apache.spark.SparkException: Job aborted due to stage
> failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task
> 0.0 in stage 0.0 (TID 0, localhost, executor driver):
> java.lang.NoSuchMethodError:
> org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;
>         at
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161)
>
> occurs when I attempt to run via spark-submit on Spark 2.4.4
>
> $ spark-submit --version
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
>       /_/
>
> Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_191
> Branch
> Compiled by user  on 2019-08-27T21:21:38Z
> Revision
> Url
> Type --help for more information.
>
>
>
> > On Nov 21, 2019, at 6:06 PM, Ryan Blue <[email protected]>
> wrote:
> >
> > Thanks for looking into it, Nandor. That doesn't sound like a problem
> with
> > Parquet, but a problem with the test environment since parquet-avro
> depends
> > on a newer API method.
> >
> > On Thu, Nov 21, 2019 at 3:58 PM Nandor Kollar
> <[email protected]>
> > wrote:
> >
> >> I'm not sure that this is a binary compatibility issue. The missing
> builder
> >> method was recently added in 1.11.0 with the introduction of the new
> >> logical type API, while the original version (one with a single
> >> OriginalType input parameter called before by AvroSchemaConverter) of
> this
> >> method is kept untouched. It seems to me that the Parquet version on
> Spark
> >> executor mismatch: parquet-avro is on 1.11.0, but parquet-column is
> still
> >> on an older version.
> >>
> >> On Thu, Nov 21, 2019 at 11:41 PM Michael Heuer <[email protected]>
> wrote:
> >>
> >>> Perhaps not strictly necessary to say, but if this particular
> >>> compatibility break between 1.10 and 1.11 was intentional, and no other
> >>> compatibility breaks are found, I would vote -1 (non-binding) on this
> RC
> >>> such that we might go back and revisit the changes to preserve
> >>> compatibility.
> >>>
> >>> I am not sure there is presently enough motivation in the Spark project
> >>> for a release after 2.4.4 and before 3.0 in which to bump the Parquet
> >>> dependency version to 1.11.x.
> >>>
> >>>   michael
> >>>
> >>>
> >>>> On Nov 21, 2019, at 11:01 AM, Ryan Blue <[email protected]>
> >>> wrote:
> >>>>
> >>>> Gabor, shouldn't Parquet be binary compatible for public APIs? From
> the
> >>>> stack trace, it looks like this 1.11.0 RC breaks binary compatibility
> >> in
> >>>> the type builders.
> >>>>
> >>>> Looks like this should have been caught by the binary compatibility
> >>> checks.
> >>>>
> >>>> On Thu, Nov 21, 2019 at 8:56 AM Gabor Szadovszky <[email protected]>
> >>> wrote:
> >>>>
> >>>>> Hi Michael,
> >>>>>
> >>>>> Unfortunately, I don't have too much experience on Spark. But if
> spark
> >>> uses
> >>>>> the parquet-mr library in an embedded way (that's how Hive uses it)
> it
> >>> is
> >>>>> required to re-build Spark with 1.11 RC parquet-mr.
> >>>>>
> >>>>> Regards,
> >>>>> Gabor
> >>>>>
> >>>>> On Wed, Nov 20, 2019 at 5:44 PM Michael Heuer <[email protected]>
> >>> wrote:
> >>>>>
> >>>>>> It appears a provided scope dependency on spark-sql leaks old
> parquet
> >>>>>> versions was causing the runtime error below.  After including new
> >>>>>> parquet-column and parquet-hadoop compile scope dependencies (in
> >>> addition
> >>>>>> to parquet-avro, which we already have at compile scope), our build
> >>>>>> succeeds.
> >>>>>>
> >>>>>> https://github.com/bigdatagenomics/adam/pull/2232 <
> >>>>>> https://github.com/bigdatagenomics/adam/pull/2232>
> >>>>>>
> >>>>>> However, when running via spark-submit I run into a similar runtime
> >>> error
> >>>>>>
> >>>>>> Caused by: java.lang.NoSuchMethodError:
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:226)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:182)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:141)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:244)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:135)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:126)
> >>>>>>       at
> >>>>>>
> >>>
> org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
> >>>>>>       at org.apache.spark.internal.io
> >>>>>>
> >> .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350)
> >>>>>>       at org.apache.spark.internal.io
> >>>>>>
> >>>>>
> >>>
> >>
> .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120)
> >>>>>>       at org.apache.spark.internal.io
> >>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
> >>>>>>       at org.apache.spark.internal.io
> >>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
> >>>>>>       at
> >>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> >>>>>>       at org.apache.spark.scheduler.Task.run(Task.scala:123)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> >>>>>>       at
> >>>>>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> >>>>>>       at
> >>>>>>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>>>>>       at
> >>>>>>
> >>>>>
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>>>>>       at java.lang.Thread.run(Thread.java:748)
> >>>>>>
> >>>>>>
> >>>>>> Will bumping our library dependency version to 1.11 require a new
> >>> version
> >>>>>> of Spark, built against Parquet 1.11?
> >>>>>>
> >>>>>> Please accept my apologies if this is heading out-of-scope for the
> >>>>> Parquet
> >>>>>> mailing list.
> >>>>>>
> >>>>>>  michael
> >>>>>>
> >>>>>>
> >>>>>>> On Nov 20, 2019, at 10:00 AM, Michael Heuer <[email protected]>
> >>> wrote:
> >>>>>>>
> >>>>>>> I am willing to do some benchmarking on genomic data at scale but
> am
> >>>>> not
> >>>>>> quite sure what the Spark target version for 1.11.0 might be.  Will
> >>>>> Parquet
> >>>>>> 1.11.0 be compatible in Spark 2.4.x?
> >>>>>>>
> >>>>>>> Updating from 1.10.1 to 1.11.0 breaks at runtime in our build
> >>>>>>>
> >>>>>>> …
> >>>>>>> D 0, localhost, executor driver): java.lang.NoClassDefFoundError:
> >>>>>> org/apache/parquet/schema/LogicalTypeAnnotation
> >>>>>>>     at
> >>>>>>
> >>>
> org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121)
> >>>>>>>     at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388)
> >>>>>>>     at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
> >>>>>>>     at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
> >>>>>>>     at org.apache.spark.internal.io
> >>>>>>
> >> .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350)
> >>>>>>>     at org.apache.spark.internal.io
> >>>>>>
> >>>>>
> >>>
> >>
> .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120)
> >>>>>>>     at org.apache.spark.internal.io
> >>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
> >>>>>>>     at org.apache.spark.internal.io
> >>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
> >>>>>>>     at
> >>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> >>>>>>>     at org.apache.spark.scheduler.Task.run(Task.scala:123)
> >>>>>>>     at
> >>>>>>
> >>>>>
> >>>
> >>
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> >>>>>>>     at
> >>>>>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> >>>>>>>     at
> >>>>>>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
> >>>>>>>     at
> >>>>>>
> >>>>>
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>>>>>>     at
> >>>>>>
> >>>>>
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>>>>>>     at java.lang.Thread.run(Thread.java:748)
> >>>>>>> Caused by: java.lang.ClassNotFoundException:
> >>>>>> org.apache.parquet.schema.LogicalTypeAnnotation
> >>>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> >>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>>>>>>     at
> >> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> >>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>>>>>>
> >>>>>>> michael
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Nov 20, 2019, at 3:29 AM, Gabor Szadovszky <[email protected]>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> Thanks, Fokko.
> >>>>>>>>
> >>>>>>>> Ryan, we did not do such measurements yet. I'm afraid, I won't
> have
> >>>>>> enough
> >>>>>>>> time to do that in the next couple of weeks.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Gabor
> >>>>>>>>
> >>>>>>>> On Tue, Nov 19, 2019 at 6:14 PM Driesprong, Fokko
> >>>>> <[email protected]
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks Gabor for the explanation. I'd like to change my vote to
> +1
> >>>>>>>>> (non-binding).
> >>>>>>>>>
> >>>>>>>>> Cheers, Fokko
> >>>>>>>>>
> >>>>>>>>> Op di 19 nov. 2019 om 18:03 schreef Ryan Blue
> >>>>>> <[email protected]>
> >>>>>>>>>
> >>>>>>>>>> Gabor, what I meant was: have we tried this with real data to
> see
> >>>>> the
> >>>>>>>>>> effect? I think those results would be helpful.
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Nov 18, 2019 at 11:35 PM Gabor Szadovszky <
> >>> [email protected]
> >>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Ryan,
> >>>>>>>>>>>
> >>>>>>>>>>> It is not easy to calculate. For the column indexes feature we
> >>>>>>>>> introduced
> >>>>>>>>>>> two new structures saved before the footer: column indexes and
> >>>>> offset
> >>>>>>>>>>> indexes. If the min/max values are not too long, then the
> >>>>> truncation
> >>>>>>>>>> might
> >>>>>>>>>>> not decrease the file size because of the offset indexes.
> >>> Moreover,
> >>>>>> we
> >>>>>>>>>> also
> >>>>>>>>>>> introduced parquet.page.row.count.limit which might increase
> the
> >>>>>> number
> >>>>>>>>>> of
> >>>>>>>>>>> pages which leads to increasing the file size.
> >>>>>>>>>>> The footer itself is also changed and we are saving more values
> >> in
> >>>>>> it:
> >>>>>>>>>> the
> >>>>>>>>>>> offset values to the column/offset indexes, the new logical
> type
> >>>>>>>>>>> structures, the CRC checksums (we might have some others).
> >>>>>>>>>>> So, the size of the files with small amount of data will be
> >>>>> increased
> >>>>>>>>>>> (because of the larger footer). The size of the files where the
> >>>>>> values
> >>>>>>>>>> can
> >>>>>>>>>>> be encoded very well (RLE) will probably be increased (because
> >> we
> >>>>>> will
> >>>>>>>>>> have
> >>>>>>>>>>> more pages). The size of some files where the values are long
> >>>>>> (>64bytes
> >>>>>>>>>> by
> >>>>>>>>>>> default) might be decreased because of truncating the min/max
> >>>>> values.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Gabor
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue
> >>>>> <[email protected]
> >>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Gabor, do we have an idea of the additional overhead for a
> >>>>> non-test
> >>>>>>>>>> data
> >>>>>>>>>>>> file? It should be easy to validate that this doesn't
> introduce
> >>> an
> >>>>>>>>>>>> unreasonable amount of overhead. In some cases, it should
> >>> actually
> >>>>>> be
> >>>>>>>>>>>> smaller since the column indexes are truncated and page stats
> >> are
> >>>>>>>>> not.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Nov 18, 2019 at 1:00 AM Gabor Szadovszky
> >>>>>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Fokko,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> For the first point. The referenced constructor is private
> and
> >>>>>>>>>> Iceberg
> >>>>>>>>>>>> uses
> >>>>>>>>>>>>> it via reflection. It is not a breaking change. I think,
> >>>>> parquet-mr
> >>>>>>>>>>> shall
> >>>>>>>>>>>>> not keep private methods only because of clients might use
> >> them
> >>>>> via
> >>>>>>>>>>>>> reflection.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> About the checksum. I've agreed on having the CRC checksum
> >> write
> >>>>>>>>>>> enabled
> >>>>>>>>>>>> by
> >>>>>>>>>>>>> default because the benchmarks did not show significant
> >>>>> performance
> >>>>>>>>>>>>> penalties. See https://github.com/apache/parquet-mr/pull/647
> >>> for
> >>>>>>>>>>>> details.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> About the file size change. 1.11.0 is introducing column
> >>> indexes,
> >>>>>>>>> CRC
> >>>>>>>>>>>>> checksum, removing the statistics from the page headers and
> >>> maybe
> >>>>>>>>>> other
> >>>>>>>>>>>>> changes that impact file size. If only file size is in
> >> question
> >>> I
> >>>>>>>>>>> cannot
> >>>>>>>>>>>>> see a breaking change here.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Gabor
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sun, Nov 17, 2019 at 9:27 PM Driesprong, Fokko
> >>>>>>>>>> <[email protected]
> >>>>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Unfortunately, a -1 from my side (non-binding)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I've updated Iceberg to Parquet 1.11.0, and found three
> >> things:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - We've broken backward compatibility of the constructor of
> >>>>>>>>>>>>>> ColumnChunkPageWriteStore
> >>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80
> >>>>>>>>>>>>>>> .
> >>>>>>>>>>>>>> This required a change
> >>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> to the code. This isn't a hard blocker, but if there will be
> >> a
> >>>>>>>>>> new
> >>>>>>>>>>>> RC,
> >>>>>>>>>>>>>> I've
> >>>>>>>>>>>>>> submitted a patch:
> >>>>>>>>>> https://github.com/apache/parquet-mr/pull/699
> >>>>>>>>>>>>>> - Related, that we need to put in the changelog, is that
> >>>>>>>>>> checksums
> >>>>>>>>>>>> are
> >>>>>>>>>>>>>> enabled by default:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54
> >>>>>>>>>>>>>> This
> >>>>>>>>>>>>>> will impact performance. I would suggest disabling it by
> >>>>>>>>>> default:
> >>>>>>>>>>>>>> https://github.com/apache/parquet-mr/pull/700
> >>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Binary compatibility. While updating Iceberg, I've noticed
> >>>>>>>>>> that
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> split-test was failing:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199
> >>>>>>>>>>>>>> The
> >>>>>>>>>>>>>> two records are now divided over four Spark partitions.
> >>>>>>>>>> Something
> >>>>>>>>>>> in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> output has changed since the files are bigger now. Has
> anyone
> >>>>>>>>>> any
> >>>>>>>>>>>> idea
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>> check what's changed, or a way to check this? The only thing
> >> I
> >>>>>>>>>> can
> >>>>>>>>>>>>>> think of
> >>>>>>>>>>>>>> is the checksum mentioned above.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> $ ls -lah ~/Desktop/parquet-1-1*
> >>>>>>>>>>>>>> -rw-r--r--  1 fokkodriesprong  staff   562B 17 nov 21:09
> >>>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
> >>>>>>>>>>>>>> -rw-r--r--  1 fokkodriesprong  staff   611B 17 nov 21:05
> >>>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> $ parquet-tools cat
> >>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
> >>>>>>>>>>>>>> id = 1
> >>>>>>>>>>>>>> data = a
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> $ parquet-tools cat
> >>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
> >>>>>>>>>>>>>> id = 1
> >>>>>>>>>>>>>> data = a
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> A binary diff here:
> >>>>>>>>>>>>>>
> >> https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers, Fokko
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Op za 16 nov. 2019 om 04:18 schreef Junjie Chen <
> >>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>> Verified signature, checksum and ran mvn install
> >> successfully.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Wang, Yuming <[email protected]> 于2019年11月14日周四
> >>>>>>>>> 下午2:05写道：
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>> Tested Parquet 1.11.0 with Spark SQL module: build/sbt
> >>>>>>>>>>>>> "sql/test-only"
> >>>>>>>>>>>>>>> -Phadoop-3.2
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 2019/11/13, 21:33, "Gabor Szadovszky" <
> >> [email protected]>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I propose the following RC to be released as official
> >>>>>>>>>> Apache
> >>>>>>>>>>>>>> Parquet
> >>>>>>>>>>>>>>> 1.11.0
> >>>>>>>>>>>>>>>> release.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The commit id is 18519eb8e059865652eee3ff0e8593f126701da4
> >>>>>>>>>>>>>>>> * This corresponds to the tag: apache-parquet-1.11.0-rc7
> >>>>>>>>>>>>>>>> *
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&amp;reserved=0
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The release tarball, signature, and checksums are here:
> >>>>>>>>>>>>>>>> *
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&amp;reserved=0
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> You can find the KEYS file here:
> >>>>>>>>>>>>>>>> *
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&amp;reserved=0
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Binary artifacts are staged in Nexus here:
> >>>>>>>>>>>>>>>> *
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&amp;reserved=0
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This release includes the changes listed at:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&amp;reserved=0
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Please download, verify, and test.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Please vote in the next 72 hours.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> [ ] +1 Release this as Apache Parquet 1.11.0
> >>>>>>>>>>>>>>>> [ ] +0
> >>>>>>>>>>>>>>>> [ ] -1 Do not release this because...
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Ryan Blue
> >>>>>>>>>>>> Software Engineer
> >>>>>>>>>>>> Netflix
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Ryan Blue
> >>>>>>>>>> Software Engineer
> >>>>>>>>>> Netflix
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Ryan Blue
> >>>> Software Engineer
> >>>> Netflix
> >>>
> >>>
> >>
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
>

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

Reply via email to