Thanks for getting this done, Gabor!

On Fri, Dec 6, 2019 at 12:44 AM Gabor Szadovszky <[email protected]> wrote:

> Thanks, Julien and all of you who have voted.
> With three binding +1 votes and four non-binding +1 votes (no -1 votes)
> this release pass.
> I'll finalize the release in the next hour.
>
> Cheers,
> Gabor
>
> On Fri, Dec 6, 2019 at 12:12 AM Julien Le Dem
> <[email protected]> wrote:
>
> > I verified the signatures
> > ran the build and test
> > It looks like the compatibility changes being discussed are not blockers.
> >
> > +1 (binding)
> >
> >
> > On Wed, Nov 27, 2019 at 1:43 AM Gabor Szadovszky <[email protected]>
> wrote:
> >
> > > Thanks, Zoltan.
> > >
> > > I also vote +1 (binding)
> > >
> > > Cheers,
> > > Gabor
> > >
> > > On Tue, Nov 26, 2019 at 1:46 PM Zoltan Ivanfi <[email protected]
> >
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > - I have read through the problem reports in this e-mail thread (one
> > > caused
> > > > by the use of a private method via reflection an another one caused
> by
> > > > having mixed versions of the libraries on the classpath) and I am
> > > convinced
> > > > that they do not block the release.
> > > > - Signature and hash of the source tarball are valid.
> > > > - The specified git hash matches the specified git tag.
> > > > - The contents of the source tarball match the contents of the git
> repo
> > > at
> > > > the specified tag.
> > > >
> > > > Br,
> > > >
> > > > Zoltan
> > > >
> > > >
> > > > On Tue, Nov 26, 2019 at 10:54 AM Gabor Szadovszky <[email protected]>
> > > > wrote:
> > > >
> > > > > Created https://issues.apache.org/jira/browse/PARQUET-1703 to
> track
> > > > this.
> > > > >
> > > > > Back to the RC. Anyone from the PMC willing to vote?
> > > > >
> > > > > Cheers,
> > > > > Gabor
> > > > >
> > > > > On Mon, Nov 25, 2019 at 6:45 PM Ryan Blue
> <[email protected]
> > >
> > > > > wrote:
> > > > >
> > > > > > Gabor, good point about not being able to check new APIs.
> Updating
> > > the
> > > > > > previous version would also allow us to get rid of temporary
> > > > exclusions,
> > > > > > like the one you pointed out for schema. It would be great to
> > improve
> > > > > what
> > > > > > we catch there.
> > > > > >
> > > > > > On Mon, Nov 25, 2019 at 1:56 AM Gabor Szadovszky <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Ryan,
> > > > > > >
> > > > > > > It is a different topic but would like to reflect shortly.
> > > > > > > I understand that 1.7.0 was the first apache release. The
> problem
> > > > with
> > > > > > > doing the compatibility checks comparing to 1.7.0 is that we
> can
> > > > easily
> > > > > > add
> > > > > > > incompatibilities in API which are added after 1.7.0. For
> > example:
> > > > > > Adding a
> > > > > > > new class for public use in 1.8.0 then removing it in 1.9.0.
> The
> > > > > > > compatibility check would not discover this breaking change.
> So,
> > I
> > > > > > think, a
> > > > > > > better approach would be to always compare to the previous
> minor
> > > > > release
> > > > > > > (e.g. comparing 1.9.0 to 1.8.0 etc.).
> > > > > > > As I've written before, even org/apache/parquet/schema/** is
> > > excluded
> > > > > > from
> > > > > > > the compatibility check. As far as I know this is public API.
> > So, I
> > > > am
> > > > > > not
> > > > > > > sure that only packages that are not part of the public API are
> > > > > excluded.
> > > > > > >
> > > > > > > Let's discuss this on the next parquet sync.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Gabor
> > > > > > >
> > > > > > > On Fri, Nov 22, 2019 at 6:20 PM Ryan Blue
> > > <[email protected]
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Gabor,
> > > > > > > >
> > > > > > > > 1.7.0 was the first version using the org.apache.parquet
> > > packages,
> > > > so
> > > > > > > > that's the correct base version for compatibility checks. The
> > > > > > exclusions
> > > > > > > in
> > > > > > > > the POM are classes that the Parquet community does not
> > consider
> > > > > > public.
> > > > > > > We
> > > > > > > > rely on these checks to highlight binary incompatibilities,
> and
> > > > then
> > > > > we
> > > > > > > > discuss them on this list or in the dev sync. If the class is
> > > > > internal,
> > > > > > > we
> > > > > > > > add an exclusion for it.
> > > > > > > >
> > > > > > > > I know you're familiar with this process since we've talked
> > about
> > > > it
> > > > > > > > before. I also know that you'd rather have more strict binary
> > > > > > > > compatibility, but until we have someone with the time to do
> > some
> > > > > > > > maintenance and build a public API module, I'm afraid that's
> > what
> > > > we
> > > > > > have
> > > > > > > > to work with.
> > > > > > > >
> > > > > > > > Michael,
> > > > > > > >
> > > > > > > > I hope the context above is helpful and explains why running
> a
> > > > binary
> > > > > > > > compatibility check tool will find incompatible changes. We
> > allow
> > > > > > binary
> > > > > > > > incompatible changes to internal classes and modules, like
> > > > > > > parquet-common.
> > > > > > > >
> > > > > > > > On Fri, Nov 22, 2019 at 12:23 AM Gabor Szadovszky <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Ryan,
> > > > > > > > > I would not trust our compatibility checks (semver) too
> much.
> > > > > > > Currently,
> > > > > > > > it
> > > > > > > > > is configured to compare our current version to 1.7.0. It
> > means
> > > > > > > anything
> > > > > > > > > that is added since 1.7.0 and then broke in a later release
> > > won't
> > > > > be
> > > > > > > > > caught. In addition, many packages are excluded from the
> > check
> > > > > > because
> > > > > > > of
> > > > > > > > > different reasons. For example org/apache/parquet/schema/**
> > is
> > > > > > excluded
> > > > > > > > so
> > > > > > > > > if it would really be an API compatibility issue we
> certainly
> > > > > > wouldn't
> > > > > > > > > catch it.
> > > > > > > > >
> > > > > > > > > Michael,
> > > > > > > > > It fails because of a NoSuchMethodError pointing to a
> method
> > > that
> > > > > is
> > > > > > > > newly
> > > > > > > > > introduced in 1.11. Both the caller and the callee shipped
> by
> > > > > > > parquet-mr.
> > > > > > > > > So, I'm quite sure it is a classpath issue. It seems that
> the
> > > > 1.11
> > > > > > > > version
> > > > > > > > > of the parquet-column jar is not on the classpath.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Nov 22, 2019 at 1:44 AM Michael Heuer <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > The dependency versions are consistent in our artifact
> > > > > > > > > >
> > > > > > > > > > $ mvn dependency:tree | grep parquet
> > > > > > > > > > [INFO] |  \-
> > > org.apache.parquet:parquet-avro:jar:1.11.0:compile
> > > > > > > > > > [INFO] |     \-
> > > > > > > > > >
> > > org.apache.parquet:parquet-format-structures:jar:1.11.0:compile
> > > > > > > > > > [INFO] |  +-
> > > > org.apache.parquet:parquet-column:jar:1.11.0:compile
> > > > > > > > > > [INFO] |  |  +-
> > > > > > org.apache.parquet:parquet-common:jar:1.11.0:compile
> > > > > > > > > > [INFO] |  |  \-
> > > > > > > org.apache.parquet:parquet-encoding:jar:1.11.0:compile
> > > > > > > > > > [INFO] |  +-
> > > > org.apache.parquet:parquet-hadoop:jar:1.11.0:compile
> > > > > > > > > > [INFO] |  |  +-
> > > > > > org.apache.parquet:parquet-jackson:jar:1.11.0:compile
> > > > > > > > > >
> > > > > > > > > > The latter error
> > > > > > > > > >
> > > > > > > > > > Caused by: org.apache.spark.SparkException: Job aborted
> due
> > > to
> > > > > > stage
> > > > > > > > > > failure: Task 0 in stage 0.0 failed 1 times, most recent
> > > > failure:
> > > > > > > Lost
> > > > > > > > > task
> > > > > > > > > > 0.0 in stage 0.0 (TID 0, localhost, executor driver):
> > > > > > > > > > java.lang.NoSuchMethodError:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;
> > > > > > > > > >         at
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161)
> > > > > > > > > >
> > > > > > > > > > occurs when I attempt to run via spark-submit on Spark
> > 2.4.4
> > > > > > > > > >
> > > > > > > > > > $ spark-submit --version
> > > > > > > > > > Welcome to
> > > > > > > > > >       ____              __
> > > > > > > > > >      / __/__  ___ _____/ /__
> > > > > > > > > >     _\ \/ _ \/ _ `/ __/  '_/
> > > > > > > > > >    /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
> > > > > > > > > >       /_/
> > > > > > > > > >
> > > > > > > > > > Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit
> Server
> > > VM,
> > > > > > > > 1.8.0_191
> > > > > > > > > > Branch
> > > > > > > > > > Compiled by user  on 2019-08-27T21:21:38Z
> > > > > > > > > > Revision
> > > > > > > > > > Url
> > > > > > > > > > Type --help for more information.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > On Nov 21, 2019, at 6:06 PM, Ryan Blue
> > > > > <[email protected]
> > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Thanks for looking into it, Nandor. That doesn't sound
> > > like a
> > > > > > > problem
> > > > > > > > > > with
> > > > > > > > > > > Parquet, but a problem with the test environment since
> > > > > > parquet-avro
> > > > > > > > > > depends
> > > > > > > > > > > on a newer API method.
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Nov 21, 2019 at 3:58 PM Nandor Kollar
> > > > > > > > > > <[email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> I'm not sure that this is a binary compatibility
> issue.
> > > The
> > > > > > > missing
> > > > > > > > > > builder
> > > > > > > > > > >> method was recently added in 1.11.0 with the
> > introduction
> > > of
> > > > > the
> > > > > > > new
> > > > > > > > > > >> logical type API, while the original version (one
> with a
> > > > > single
> > > > > > > > > > >> OriginalType input parameter called before by
> > > > > > AvroSchemaConverter)
> > > > > > > > of
> > > > > > > > > > this
> > > > > > > > > > >> method is kept untouched. It seems to me that the
> > Parquet
> > > > > > version
> > > > > > > on
> > > > > > > > > > Spark
> > > > > > > > > > >> executor mismatch: parquet-avro is on 1.11.0, but
> > > > > parquet-column
> > > > > > > is
> > > > > > > > > > still
> > > > > > > > > > >> on an older version.
> > > > > > > > > > >>
> > > > > > > > > > >> On Thu, Nov 21, 2019 at 11:41 PM Michael Heuer <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >>> Perhaps not strictly necessary to say, but if this
> > > > particular
> > > > > > > > > > >>> compatibility break between 1.10 and 1.11 was
> > > intentional,
> > > > > and
> > > > > > no
> > > > > > > > > other
> > > > > > > > > > >>> compatibility breaks are found, I would vote -1
> > > > (non-binding)
> > > > > > on
> > > > > > > > this
> > > > > > > > > > RC
> > > > > > > > > > >>> such that we might go back and revisit the changes to
> > > > > preserve
> > > > > > > > > > >>> compatibility.
> > > > > > > > > > >>>
> > > > > > > > > > >>> I am not sure there is presently enough motivation in
> > the
> > > > > Spark
> > > > > > > > > project
> > > > > > > > > > >>> for a release after 2.4.4 and before 3.0 in which to
> > bump
> > > > the
> > > > > > > > Parquet
> > > > > > > > > > >>> dependency version to 1.11.x.
> > > > > > > > > > >>>
> > > > > > > > > > >>>   michael
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>>> On Nov 21, 2019, at 11:01 AM, Ryan Blue
> > > > > > > <[email protected]
> > > > > > > > >
> > > > > > > > > > >>> wrote:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> Gabor, shouldn't Parquet be binary compatible for
> > public
> > > > > APIs?
> > > > > > > > From
> > > > > > > > > > the
> > > > > > > > > > >>>> stack trace, it looks like this 1.11.0 RC breaks
> > binary
> > > > > > > > > compatibility
> > > > > > > > > > >> in
> > > > > > > > > > >>>> the type builders.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> Looks like this should have been caught by the
> binary
> > > > > > > > compatibility
> > > > > > > > > > >>> checks.
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> On Thu, Nov 21, 2019 at 8:56 AM Gabor Szadovszky <
> > > > > > > > [email protected]>
> > > > > > > > > > >>> wrote:
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>> Hi Michael,
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>> Unfortunately, I don't have too much experience on
> > > Spark.
> > > > > But
> > > > > > > if
> > > > > > > > > > spark
> > > > > > > > > > >>> uses
> > > > > > > > > > >>>>> the parquet-mr library in an embedded way (that's
> how
> > > > Hive
> > > > > > uses
> > > > > > > > it)
> > > > > > > > > > it
> > > > > > > > > > >>> is
> > > > > > > > > > >>>>> required to re-build Spark with 1.11 RC parquet-mr.
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>> Regards,
> > > > > > > > > > >>>>> Gabor
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>> On Wed, Nov 20, 2019 at 5:44 PM Michael Heuer <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > >>> wrote:
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>>> It appears a provided scope dependency on
> spark-sql
> > > > leaks
> > > > > > old
> > > > > > > > > > parquet
> > > > > > > > > > >>>>>> versions was causing the runtime error below.
> After
> > > > > > including
> > > > > > > > new
> > > > > > > > > > >>>>>> parquet-column and parquet-hadoop compile scope
> > > > > dependencies
> > > > > > > (in
> > > > > > > > > > >>> addition
> > > > > > > > > > >>>>>> to parquet-avro, which we already have at compile
> > > > scope),
> > > > > > our
> > > > > > > > > build
> > > > > > > > > > >>>>>> succeeds.
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> https://github.com/bigdatagenomics/adam/pull/2232
> <
> > > > > > > > > > >>>>>> https://github.com/bigdatagenomics/adam/pull/2232
> >
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> However, when running via spark-submit I run into
> a
> > > > > similar
> > > > > > > > > runtime
> > > > > > > > > > >>> error
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder;
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:226)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:182)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:141)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:244)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:135)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:126)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
> > > > > > > > > > >>>>>>       at org.apache.spark.internal.io
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>
> > > > > > > > >
> > > > > >
> > > .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350)
> > > > > > > > > > >>>>>>       at org.apache.spark.internal.io
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120)
> > > > > > > > > > >>>>>>       at org.apache.spark.internal.io
> > > > > > > > > > >>>>>>
> > > > > > > .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
> > > > > > > > > > >>>>>>       at org.apache.spark.internal.io
> > > > > > > > > > >>>>>>
> > > > > > > .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > >
> > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> > > > > > > > > > >>>>>>       at
> > > > > org.apache.spark.scheduler.Task.run(Task.scala:123)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > >
> > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > >
> > > > > >
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > > > > > > > > > >>>>>>       at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > > > > > > > > > >>>>>>       at java.lang.Thread.run(Thread.java:748)
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> Will bumping our library dependency version to
> 1.11
> > > > > require
> > > > > > a
> > > > > > > > new
> > > > > > > > > > >>> version
> > > > > > > > > > >>>>>> of Spark, built against Parquet 1.11?
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>> Please accept my apologies if this is heading
> > > > out-of-scope
> > > > > > for
> > > > > > > > the
> > > > > > > > > > >>>>> Parquet
> > > > > > > > > > >>>>>> mailing list.
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>>  michael
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>>> On Nov 20, 2019, at 10:00 AM, Michael Heuer <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > >>> wrote:
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>> I am willing to do some benchmarking on genomic
> > data
> > > at
> > > > > > scale
> > > > > > > > but
> > > > > > > > > > am
> > > > > > > > > > >>>>> not
> > > > > > > > > > >>>>>> quite sure what the Spark target version for
> 1.11.0
> > > > might
> > > > > > be.
> > > > > > > > > Will
> > > > > > > > > > >>>>> Parquet
> > > > > > > > > > >>>>>> 1.11.0 be compatible in Spark 2.4.x?
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>> Updating from 1.10.1 to 1.11.0 breaks at runtime
> in
> > > our
> > > > > > build
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>> …
> > > > > > > > > > >>>>>>> D 0, localhost, executor driver):
> > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > >>>>>> org/apache/parquet/schema/LogicalTypeAnnotation
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
> > > > > > > > > > >>>>>>>     at org.apache.spark.internal.io
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>
> > > > > > > > >
> > > > > >
> > > .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350)
> > > > > > > > > > >>>>>>>     at org.apache.spark.internal.io
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120)
> > > > > > > > > > >>>>>>>     at org.apache.spark.internal.io
> > > > > > > > > > >>>>>>
> > > > > > > .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
> > > > > > > > > > >>>>>>>     at org.apache.spark.internal.io
> > > > > > > > > > >>>>>>
> > > > > > > .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > >
> > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> > > > > > > > > > >>>>>>>     at
> > > > > org.apache.spark.scheduler.Task.run(Task.scala:123)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > >
> > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > > > >
> > > > > >
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > > > > > > > > > >>>>>>>     at java.lang.Thread.run(Thread.java:748)
> > > > > > > > > > >>>>>>> Caused by: java.lang.ClassNotFoundException:
> > > > > > > > > > >>>>>> org.apache.parquet.schema.LogicalTypeAnnotation
> > > > > > > > > > >>>>>>>     at
> > > > > > > > java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> > > > > > > > > > >>>>>>>     at
> > > > > > java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > > > > > > > > > >>>>>>>     at
> > > > > > > > > > >>
> > > > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> > > > > > > > > > >>>>>>>     at
> > > > > > java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>> michael
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>>> On Nov 20, 2019, at 3:29 AM, Gabor Szadovszky <
> > > > > > > > [email protected]
> > > > > > > > > >
> > > > > > > > > > >>>>> wrote:
> > > > > > > > > > >>>>>>>>
> > > > > > > > > > >>>>>>>> Thanks, Fokko.
> > > > > > > > > > >>>>>>>>
> > > > > > > > > > >>>>>>>> Ryan, we did not do such measurements yet. I'm
> > > > afraid, I
> > > > > > > won't
> > > > > > > > > > have
> > > > > > > > > > >>>>>> enough
> > > > > > > > > > >>>>>>>> time to do that in the next couple of weeks.
> > > > > > > > > > >>>>>>>>
> > > > > > > > > > >>>>>>>> Cheers,
> > > > > > > > > > >>>>>>>> Gabor
> > > > > > > > > > >>>>>>>>
> > > > > > > > > > >>>>>>>> On Tue, Nov 19, 2019 at 6:14 PM Driesprong,
> Fokko
> > > > > > > > > > >>>>> <[email protected]
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>>> wrote:
> > > > > > > > > > >>>>>>>>
> > > > > > > > > > >>>>>>>>> Thanks Gabor for the explanation. I'd like to
> > > change
> > > > my
> > > > > > > vote
> > > > > > > > to
> > > > > > > > > > +1
> > > > > > > > > > >>>>>>>>> (non-binding).
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>>>> Cheers, Fokko
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>>>> Op di 19 nov. 2019 om 18:03 schreef Ryan Blue
> > > > > > > > > > >>>>>> <[email protected]>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>>>>> Gabor, what I meant was: have we tried this
> with
> > > > real
> > > > > > data
> > > > > > > > to
> > > > > > > > > > see
> > > > > > > > > > >>>>> the
> > > > > > > > > > >>>>>>>>>> effect? I think those results would be
> helpful.
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>> On Mon, Nov 18, 2019 at 11:35 PM Gabor
> > Szadovszky
> > > <
> > > > > > > > > > >>> [email protected]
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>>>>>> wrote:
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>> Hi Ryan,
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>> It is not easy to calculate. For the column
> > > indexes
> > > > > > > feature
> > > > > > > > > we
> > > > > > > > > > >>>>>>>>> introduced
> > > > > > > > > > >>>>>>>>>>> two new structures saved before the footer:
> > > column
> > > > > > > indexes
> > > > > > > > > and
> > > > > > > > > > >>>>> offset
> > > > > > > > > > >>>>>>>>>>> indexes. If the min/max values are not too
> > long,
> > > > then
> > > > > > the
> > > > > > > > > > >>>>> truncation
> > > > > > > > > > >>>>>>>>>> might
> > > > > > > > > > >>>>>>>>>>> not decrease the file size because of the
> > offset
> > > > > > indexes.
> > > > > > > > > > >>> Moreover,
> > > > > > > > > > >>>>>> we
> > > > > > > > > > >>>>>>>>>> also
> > > > > > > > > > >>>>>>>>>>> introduced parquet.page.row.count.limit which
> > > might
> > > > > > > > increase
> > > > > > > > > > the
> > > > > > > > > > >>>>>> number
> > > > > > > > > > >>>>>>>>>> of
> > > > > > > > > > >>>>>>>>>>> pages which leads to increasing the file
> size.
> > > > > > > > > > >>>>>>>>>>> The footer itself is also changed and we are
> > > saving
> > > > > > more
> > > > > > > > > values
> > > > > > > > > > >> in
> > > > > > > > > > >>>>>> it:
> > > > > > > > > > >>>>>>>>>> the
> > > > > > > > > > >>>>>>>>>>> offset values to the column/offset indexes,
> the
> > > new
> > > > > > > logical
> > > > > > > > > > type
> > > > > > > > > > >>>>>>>>>>> structures, the CRC checksums (we might have
> > some
> > > > > > > others).
> > > > > > > > > > >>>>>>>>>>> So, the size of the files with small amount
> of
> > > data
> > > > > > will
> > > > > > > be
> > > > > > > > > > >>>>> increased
> > > > > > > > > > >>>>>>>>>>> (because of the larger footer). The size of
> the
> > > > files
> > > > > > > where
> > > > > > > > > the
> > > > > > > > > > >>>>>> values
> > > > > > > > > > >>>>>>>>>> can
> > > > > > > > > > >>>>>>>>>>> be encoded very well (RLE) will probably be
> > > > increased
> > > > > > > > > (because
> > > > > > > > > > >> we
> > > > > > > > > > >>>>>> will
> > > > > > > > > > >>>>>>>>>> have
> > > > > > > > > > >>>>>>>>>>> more pages). The size of some files where the
> > > > values
> > > > > > are
> > > > > > > > long
> > > > > > > > > > >>>>>> (>64bytes
> > > > > > > > > > >>>>>>>>>> by
> > > > > > > > > > >>>>>>>>>>> default) might be decreased because of
> > truncating
> > > > the
> > > > > > > > min/max
> > > > > > > > > > >>>>> values.
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>> Regards,
> > > > > > > > > > >>>>>>>>>>> Gabor
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>> On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue
> > > > > > > > > > >>>>> <[email protected]
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>>>>>> wrote:
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>> Gabor, do we have an idea of the additional
> > > > overhead
> > > > > > > for a
> > > > > > > > > > >>>>> non-test
> > > > > > > > > > >>>>>>>>>> data
> > > > > > > > > > >>>>>>>>>>>> file? It should be easy to validate that
> this
> > > > > doesn't
> > > > > > > > > > introduce
> > > > > > > > > > >>> an
> > > > > > > > > > >>>>>>>>>>>> unreasonable amount of overhead. In some
> > cases,
> > > it
> > > > > > > should
> > > > > > > > > > >>> actually
> > > > > > > > > > >>>>>> be
> > > > > > > > > > >>>>>>>>>>>> smaller since the column indexes are
> truncated
> > > and
> > > > > > page
> > > > > > > > > stats
> > > > > > > > > > >> are
> > > > > > > > > > >>>>>>>>> not.
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>> On Mon, Nov 18, 2019 at 1:00 AM Gabor
> > Szadovszky
> > > > > > > > > > >>>>>>>>>>>> <[email protected]>
> > wrote:
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>> Hi Fokko,
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>> For the first point. The referenced
> > constructor
> > > > is
> > > > > > > > private
> > > > > > > > > > and
> > > > > > > > > > >>>>>>>>>> Iceberg
> > > > > > > > > > >>>>>>>>>>>> uses
> > > > > > > > > > >>>>>>>>>>>>> it via reflection. It is not a breaking
> > > change. I
> > > > > > > think,
> > > > > > > > > > >>>>> parquet-mr
> > > > > > > > > > >>>>>>>>>>> shall
> > > > > > > > > > >>>>>>>>>>>>> not keep private methods only because of
> > > clients
> > > > > > might
> > > > > > > > use
> > > > > > > > > > >> them
> > > > > > > > > > >>>>> via
> > > > > > > > > > >>>>>>>>>>>>> reflection.
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>> About the checksum. I've agreed on having
> the
> > > CRC
> > > > > > > > checksum
> > > > > > > > > > >> write
> > > > > > > > > > >>>>>>>>>>> enabled
> > > > > > > > > > >>>>>>>>>>>> by
> > > > > > > > > > >>>>>>>>>>>>> default because the benchmarks did not show
> > > > > > significant
> > > > > > > > > > >>>>> performance
> > > > > > > > > > >>>>>>>>>>>>> penalties. See
> > > > > > > > > https://github.com/apache/parquet-mr/pull/647
> > > > > > > > > > >>> for
> > > > > > > > > > >>>>>>>>>>>> details.
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>> About the file size change. 1.11.0 is
> > > introducing
> > > > > > > column
> > > > > > > > > > >>> indexes,
> > > > > > > > > > >>>>>>>>> CRC
> > > > > > > > > > >>>>>>>>>>>>> checksum, removing the statistics from the
> > page
> > > > > > headers
> > > > > > > > and
> > > > > > > > > > >>> maybe
> > > > > > > > > > >>>>>>>>>> other
> > > > > > > > > > >>>>>>>>>>>>> changes that impact file size. If only file
> > > size
> > > > is
> > > > > > in
> > > > > > > > > > >> question
> > > > > > > > > > >>> I
> > > > > > > > > > >>>>>>>>>>> cannot
> > > > > > > > > > >>>>>>>>>>>>> see a breaking change here.
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>> Regards,
> > > > > > > > > > >>>>>>>>>>>>> Gabor
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>> On Sun, Nov 17, 2019 at 9:27 PM Driesprong,
> > > Fokko
> > > > > > > > > > >>>>>>>>>> <[email protected]
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>> wrote:
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> Unfortunately, a -1 from my side
> > (non-binding)
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> I've updated Iceberg to Parquet 1.11.0,
> and
> > > > found
> > > > > > > three
> > > > > > > > > > >> things:
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> - We've broken backward compatibility of
> the
> > > > > > > constructor
> > > > > > > > > of
> > > > > > > > > > >>>>>>>>>>>>>> ColumnChunkPageWriteStore
> > > > > > > > > > >>>>>>>>>>>>>> <
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80
> > > > > > > > > > >>>>>>>>>>>>>>> .
> > > > > > > > > > >>>>>>>>>>>>>> This required a change
> > > > > > > > > > >>>>>>>>>>>>>> <
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> to the code. This isn't a hard blocker,
> but
> > if
> > > > > there
> > > > > > > > will
> > > > > > > > > be
> > > > > > > > > > >> a
> > > > > > > > > > >>>>>>>>>> new
> > > > > > > > > > >>>>>>>>>>>> RC,
> > > > > > > > > > >>>>>>>>>>>>>> I've
> > > > > > > > > > >>>>>>>>>>>>>> submitted a patch:
> > > > > > > > > > >>>>>>>>>> https://github.com/apache/parquet-mr/pull/699
> > > > > > > > > > >>>>>>>>>>>>>> - Related, that we need to put in the
> > > changelog,
> > > > > is
> > > > > > > that
> > > > > > > > > > >>>>>>>>>> checksums
> > > > > > > > > > >>>>>>>>>>>> are
> > > > > > > > > > >>>>>>>>>>>>>> enabled by default:
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54
> > > > > > > > > > >>>>>>>>>>>>>> This
> > > > > > > > > > >>>>>>>>>>>>>> will impact performance. I would suggest
> > > > disabling
> > > > > > it
> > > > > > > by
> > > > > > > > > > >>>>>>>>>> default:
> > > > > > > > > > >>>>>>>>>>>>>>
> > https://github.com/apache/parquet-mr/pull/700
> > > > > > > > > > >>>>>>>>>>>>>> <
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> - Binary compatibility. While updating
> > > Iceberg,
> > > > > I've
> > > > > > > > > noticed
> > > > > > > > > > >>>>>>>>>> that
> > > > > > > > > > >>>>>>>>>>>> the
> > > > > > > > > > >>>>>>>>>>>>>> split-test was failing:
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199
> > > > > > > > > > >>>>>>>>>>>>>> The
> > > > > > > > > > >>>>>>>>>>>>>> two records are now divided over four
> Spark
> > > > > > > partitions.
> > > > > > > > > > >>>>>>>>>> Something
> > > > > > > > > > >>>>>>>>>>> in
> > > > > > > > > > >>>>>>>>>>>>> the
> > > > > > > > > > >>>>>>>>>>>>>> output has changed since the files are
> > bigger
> > > > now.
> > > > > > Has
> > > > > > > > > > anyone
> > > > > > > > > > >>>>>>>>>> any
> > > > > > > > > > >>>>>>>>>>>> idea
> > > > > > > > > > >>>>>>>>>>>>>> to
> > > > > > > > > > >>>>>>>>>>>>>> check what's changed, or a way to check
> > this?
> > > > The
> > > > > > only
> > > > > > > > > thing
> > > > > > > > > > >> I
> > > > > > > > > > >>>>>>>>>> can
> > > > > > > > > > >>>>>>>>>>>>>> think of
> > > > > > > > > > >>>>>>>>>>>>>> is the checksum mentioned above.
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> $ ls -lah ~/Desktop/parquet-1-1*
> > > > > > > > > > >>>>>>>>>>>>>> -rw-r--r--  1 fokkodriesprong  staff
>  562B
> > 17
> > > > nov
> > > > > > > 21:09
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
> > > > > > > > > > >>>>>>>>>>>>>> -rw-r--r--  1 fokkodriesprong  staff
>  611B
> > 17
> > > > nov
> > > > > > > 21:05
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> $ parquet-tools cat
> > > > > > > > > > >>>>>>>>>>>>
> > > > > /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet
> > > > > > > > > > >>>>>>>>>>>>>> id = 1
> > > > > > > > > > >>>>>>>>>>>>>> data = a
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> $ parquet-tools cat
> > > > > > > > > > >>>>>>>>>>>>
> > > > > /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet
> > > > > > > > > > >>>>>>>>>>>>>> id = 1
> > > > > > > > > > >>>>>>>>>>>>>> data = a
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> A binary diff here:
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>
> > > > > https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> Cheers, Fokko
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>> Op za 16 nov. 2019 om 04:18 schreef Junjie
> > > Chen
> > > > <
> > > > > > > > > > >>>>>>>>>>>>> [email protected]
> > > > > > > > > > >>>>>>>>>>>>>>> :
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>> +1
> > > > > > > > > > >>>>>>>>>>>>>>> Verified signature, checksum and ran mvn
> > > > install
> > > > > > > > > > >> successfully.
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>> Wang, Yuming <[email protected]>
> > > > > > > 于2019年11月14日周四
> > > > > > > > > > >>>>>>>>> 下午2:05写道:
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> +1
> > > > > > > > > > >>>>>>>>>>>>>>>> Tested Parquet 1.11.0 with Spark SQL
> > module:
> > > > > > > build/sbt
> > > > > > > > > > >>>>>>>>>>>>> "sql/test-only"
> > > > > > > > > > >>>>>>>>>>>>>>> -Phadoop-3.2
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> On 2019/11/13, 21:33, "Gabor
> Szadovszky"
> > <
> > > > > > > > > > >> [email protected]>
> > > > > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> Hi everyone,
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> I propose the following RC to be
> released
> > as
> > > > > > > official
> > > > > > > > > > >>>>>>>>>> Apache
> > > > > > > > > > >>>>>>>>>>>>>> Parquet
> > > > > > > > > > >>>>>>>>>>>>>>> 1.11.0
> > > > > > > > > > >>>>>>>>>>>>>>>> release.
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> The commit id is
> > > > > > > > > 18519eb8e059865652eee3ff0e8593f126701da4
> > > > > > > > > > >>>>>>>>>>>>>>>> * This corresponds to the tag:
> > > > > > > > apache-parquet-1.11.0-rc7
> > > > > > > > > > >>>>>>>>>>>>>>>> *
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&amp;reserved=0
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> The release tarball, signature, and
> > > checksums
> > > > > are
> > > > > > > > here:
> > > > > > > > > > >>>>>>>>>>>>>>>> *
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&amp;reserved=0
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> You can find the KEYS file here:
> > > > > > > > > > >>>>>>>>>>>>>>>> *
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&amp;reserved=0
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> Binary artifacts are staged in Nexus
> here:
> > > > > > > > > > >>>>>>>>>>>>>>>> *
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&amp;reserved=0
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> This release includes the changes listed
> > at:
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&amp;data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&amp;sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&amp;reserved=0
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> Please download, verify, and test.
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> Please vote in the next 72 hours.
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>> [ ] +1 Release this as Apache Parquet
> > 1.11.0
> > > > > > > > > > >>>>>>>>>>>>>>>> [ ] +0
> > > > > > > > > > >>>>>>>>>>>>>>>> [ ] -1 Do not release this because...
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>> --
> > > > > > > > > > >>>>>>>>>>>> Ryan Blue
> > > > > > > > > > >>>>>>>>>>>> Software Engineer
> > > > > > > > > > >>>>>>>>>>>> Netflix
> > > > > > > > > > >>>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>> --
> > > > > > > > > > >>>>>>>>>> Ryan Blue
> > > > > > > > > > >>>>>>>>>> Software Engineer
> > > > > > > > > > >>>>>>>>>> Netflix
> > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > >>>>>>>>>
> > > > > > > > > > >>>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>>
> > > > > > > > > > >>>>>
> > > > > > > > > > >>>>
> > > > > > > > > > >>>>
> > > > > > > > > > >>>> --
> > > > > > > > > > >>>> Ryan Blue
> > > > > > > > > > >>>> Software Engineer
> > > > > > > > > > >>>> Netflix
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Ryan Blue
> > > > > > > > > > > Software Engineer
> > > > > > > > > > > Netflix
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Ryan Blue
> > > > > > > > Software Engineer
> > > > > > > > Netflix
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Ryan Blue
> > > > > > Software Engineer
> > > > > > Netflix
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to