Thanks for getting this done, Gabor! On Fri, Dec 6, 2019 at 12:44 AM Gabor Szadovszky <[email protected]> wrote:
> Thanks, Julien and all of you who have voted. > With three binding +1 votes and four non-binding +1 votes (no -1 votes) > this release pass. > I'll finalize the release in the next hour. > > Cheers, > Gabor > > On Fri, Dec 6, 2019 at 12:12 AM Julien Le Dem > <[email protected]> wrote: > > > I verified the signatures > > ran the build and test > > It looks like the compatibility changes being discussed are not blockers. > > > > +1 (binding) > > > > > > On Wed, Nov 27, 2019 at 1:43 AM Gabor Szadovszky <[email protected]> > wrote: > > > > > Thanks, Zoltan. > > > > > > I also vote +1 (binding) > > > > > > Cheers, > > > Gabor > > > > > > On Tue, Nov 26, 2019 at 1:46 PM Zoltan Ivanfi <[email protected] > > > > > wrote: > > > > > > > +1 (binding) > > > > > > > > - I have read through the problem reports in this e-mail thread (one > > > caused > > > > by the use of a private method via reflection an another one caused > by > > > > having mixed versions of the libraries on the classpath) and I am > > > convinced > > > > that they do not block the release. > > > > - Signature and hash of the source tarball are valid. > > > > - The specified git hash matches the specified git tag. > > > > - The contents of the source tarball match the contents of the git > repo > > > at > > > > the specified tag. > > > > > > > > Br, > > > > > > > > Zoltan > > > > > > > > > > > > On Tue, Nov 26, 2019 at 10:54 AM Gabor Szadovszky <[email protected]> > > > > wrote: > > > > > > > > > Created https://issues.apache.org/jira/browse/PARQUET-1703 to > track > > > > this. > > > > > > > > > > Back to the RC. Anyone from the PMC willing to vote? > > > > > > > > > > Cheers, > > > > > Gabor > > > > > > > > > > On Mon, Nov 25, 2019 at 6:45 PM Ryan Blue > <[email protected] > > > > > > > > wrote: > > > > > > > > > > > Gabor, good point about not being able to check new APIs. > Updating > > > the > > > > > > previous version would also allow us to get rid of temporary > > > > exclusions, > > > > > > like the one you pointed out for schema. It would be great to > > improve > > > > > what > > > > > > we catch there. > > > > > > > > > > > > On Mon, Nov 25, 2019 at 1:56 AM Gabor Szadovszky < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > Hi Ryan, > > > > > > > > > > > > > > It is a different topic but would like to reflect shortly. > > > > > > > I understand that 1.7.0 was the first apache release. The > problem > > > > with > > > > > > > doing the compatibility checks comparing to 1.7.0 is that we > can > > > > easily > > > > > > add > > > > > > > incompatibilities in API which are added after 1.7.0. For > > example: > > > > > > Adding a > > > > > > > new class for public use in 1.8.0 then removing it in 1.9.0. > The > > > > > > > compatibility check would not discover this breaking change. > So, > > I > > > > > > think, a > > > > > > > better approach would be to always compare to the previous > minor > > > > > release > > > > > > > (e.g. comparing 1.9.0 to 1.8.0 etc.). > > > > > > > As I've written before, even org/apache/parquet/schema/** is > > > excluded > > > > > > from > > > > > > > the compatibility check. As far as I know this is public API. > > So, I > > > > am > > > > > > not > > > > > > > sure that only packages that are not part of the public API are > > > > > excluded. > > > > > > > > > > > > > > Let's discuss this on the next parquet sync. > > > > > > > > > > > > > > Regards, > > > > > > > Gabor > > > > > > > > > > > > > > On Fri, Nov 22, 2019 at 6:20 PM Ryan Blue > > > <[email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Gabor, > > > > > > > > > > > > > > > > 1.7.0 was the first version using the org.apache.parquet > > > packages, > > > > so > > > > > > > > that's the correct base version for compatibility checks. The > > > > > > exclusions > > > > > > > in > > > > > > > > the POM are classes that the Parquet community does not > > consider > > > > > > public. > > > > > > > We > > > > > > > > rely on these checks to highlight binary incompatibilities, > and > > > > then > > > > > we > > > > > > > > discuss them on this list or in the dev sync. If the class is > > > > > internal, > > > > > > > we > > > > > > > > add an exclusion for it. > > > > > > > > > > > > > > > > I know you're familiar with this process since we've talked > > about > > > > it > > > > > > > > before. I also know that you'd rather have more strict binary > > > > > > > > compatibility, but until we have someone with the time to do > > some > > > > > > > > maintenance and build a public API module, I'm afraid that's > > what > > > > we > > > > > > have > > > > > > > > to work with. > > > > > > > > > > > > > > > > Michael, > > > > > > > > > > > > > > > > I hope the context above is helpful and explains why running > a > > > > binary > > > > > > > > compatibility check tool will find incompatible changes. We > > allow > > > > > > binary > > > > > > > > incompatible changes to internal classes and modules, like > > > > > > > parquet-common. > > > > > > > > > > > > > > > > On Fri, Nov 22, 2019 at 12:23 AM Gabor Szadovszky < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Ryan, > > > > > > > > > I would not trust our compatibility checks (semver) too > much. > > > > > > > Currently, > > > > > > > > it > > > > > > > > > is configured to compare our current version to 1.7.0. It > > means > > > > > > > anything > > > > > > > > > that is added since 1.7.0 and then broke in a later release > > > won't > > > > > be > > > > > > > > > caught. In addition, many packages are excluded from the > > check > > > > > > because > > > > > > > of > > > > > > > > > different reasons. For example org/apache/parquet/schema/** > > is > > > > > > excluded > > > > > > > > so > > > > > > > > > if it would really be an API compatibility issue we > certainly > > > > > > wouldn't > > > > > > > > > catch it. > > > > > > > > > > > > > > > > > > Michael, > > > > > > > > > It fails because of a NoSuchMethodError pointing to a > method > > > that > > > > > is > > > > > > > > newly > > > > > > > > > introduced in 1.11. Both the caller and the callee shipped > by > > > > > > > parquet-mr. > > > > > > > > > So, I'm quite sure it is a classpath issue. It seems that > the > > > > 1.11 > > > > > > > > version > > > > > > > > > of the parquet-column jar is not on the classpath. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 22, 2019 at 1:44 AM Michael Heuer < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > The dependency versions are consistent in our artifact > > > > > > > > > > > > > > > > > > > > $ mvn dependency:tree | grep parquet > > > > > > > > > > [INFO] | \- > > > org.apache.parquet:parquet-avro:jar:1.11.0:compile > > > > > > > > > > [INFO] | \- > > > > > > > > > > > > > org.apache.parquet:parquet-format-structures:jar:1.11.0:compile > > > > > > > > > > [INFO] | +- > > > > org.apache.parquet:parquet-column:jar:1.11.0:compile > > > > > > > > > > [INFO] | | +- > > > > > > org.apache.parquet:parquet-common:jar:1.11.0:compile > > > > > > > > > > [INFO] | | \- > > > > > > > org.apache.parquet:parquet-encoding:jar:1.11.0:compile > > > > > > > > > > [INFO] | +- > > > > org.apache.parquet:parquet-hadoop:jar:1.11.0:compile > > > > > > > > > > [INFO] | | +- > > > > > > org.apache.parquet:parquet-jackson:jar:1.11.0:compile > > > > > > > > > > > > > > > > > > > > The latter error > > > > > > > > > > > > > > > > > > > > Caused by: org.apache.spark.SparkException: Job aborted > due > > > to > > > > > > stage > > > > > > > > > > failure: Task 0 in stage 0.0 failed 1 times, most recent > > > > failure: > > > > > > > Lost > > > > > > > > > task > > > > > > > > > > 0.0 in stage 0.0 (TID 0, localhost, executor driver): > > > > > > > > > > java.lang.NoSuchMethodError: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder; > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161) > > > > > > > > > > > > > > > > > > > > occurs when I attempt to run via spark-submit on Spark > > 2.4.4 > > > > > > > > > > > > > > > > > > > > $ spark-submit --version > > > > > > > > > > Welcome to > > > > > > > > > > ____ __ > > > > > > > > > > / __/__ ___ _____/ /__ > > > > > > > > > > _\ \/ _ \/ _ `/ __/ '_/ > > > > > > > > > > /___/ .__/\_,_/_/ /_/\_\ version 2.4.4 > > > > > > > > > > /_/ > > > > > > > > > > > > > > > > > > > > Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit > Server > > > VM, > > > > > > > > 1.8.0_191 > > > > > > > > > > Branch > > > > > > > > > > Compiled by user on 2019-08-27T21:21:38Z > > > > > > > > > > Revision > > > > > > > > > > Url > > > > > > > > > > Type --help for more information. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Nov 21, 2019, at 6:06 PM, Ryan Blue > > > > > <[email protected] > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Thanks for looking into it, Nandor. That doesn't sound > > > like a > > > > > > > problem > > > > > > > > > > with > > > > > > > > > > > Parquet, but a problem with the test environment since > > > > > > parquet-avro > > > > > > > > > > depends > > > > > > > > > > > on a newer API method. > > > > > > > > > > > > > > > > > > > > > > On Thu, Nov 21, 2019 at 3:58 PM Nandor Kollar > > > > > > > > > > <[email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> I'm not sure that this is a binary compatibility > issue. > > > The > > > > > > > missing > > > > > > > > > > builder > > > > > > > > > > >> method was recently added in 1.11.0 with the > > introduction > > > of > > > > > the > > > > > > > new > > > > > > > > > > >> logical type API, while the original version (one > with a > > > > > single > > > > > > > > > > >> OriginalType input parameter called before by > > > > > > AvroSchemaConverter) > > > > > > > > of > > > > > > > > > > this > > > > > > > > > > >> method is kept untouched. It seems to me that the > > Parquet > > > > > > version > > > > > > > on > > > > > > > > > > Spark > > > > > > > > > > >> executor mismatch: parquet-avro is on 1.11.0, but > > > > > parquet-column > > > > > > > is > > > > > > > > > > still > > > > > > > > > > >> on an older version. > > > > > > > > > > >> > > > > > > > > > > >> On Thu, Nov 21, 2019 at 11:41 PM Michael Heuer < > > > > > > [email protected] > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > >> > > > > > > > > > > >>> Perhaps not strictly necessary to say, but if this > > > > particular > > > > > > > > > > >>> compatibility break between 1.10 and 1.11 was > > > intentional, > > > > > and > > > > > > no > > > > > > > > > other > > > > > > > > > > >>> compatibility breaks are found, I would vote -1 > > > > (non-binding) > > > > > > on > > > > > > > > this > > > > > > > > > > RC > > > > > > > > > > >>> such that we might go back and revisit the changes to > > > > > preserve > > > > > > > > > > >>> compatibility. > > > > > > > > > > >>> > > > > > > > > > > >>> I am not sure there is presently enough motivation in > > the > > > > > Spark > > > > > > > > > project > > > > > > > > > > >>> for a release after 2.4.4 and before 3.0 in which to > > bump > > > > the > > > > > > > > Parquet > > > > > > > > > > >>> dependency version to 1.11.x. > > > > > > > > > > >>> > > > > > > > > > > >>> michael > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >>>> On Nov 21, 2019, at 11:01 AM, Ryan Blue > > > > > > > <[email protected] > > > > > > > > > > > > > > > > > > > >>> wrote: > > > > > > > > > > >>>> > > > > > > > > > > >>>> Gabor, shouldn't Parquet be binary compatible for > > public > > > > > APIs? > > > > > > > > From > > > > > > > > > > the > > > > > > > > > > >>>> stack trace, it looks like this 1.11.0 RC breaks > > binary > > > > > > > > > compatibility > > > > > > > > > > >> in > > > > > > > > > > >>>> the type builders. > > > > > > > > > > >>>> > > > > > > > > > > >>>> Looks like this should have been caught by the > binary > > > > > > > > compatibility > > > > > > > > > > >>> checks. > > > > > > > > > > >>>> > > > > > > > > > > >>>> On Thu, Nov 21, 2019 at 8:56 AM Gabor Szadovszky < > > > > > > > > [email protected]> > > > > > > > > > > >>> wrote: > > > > > > > > > > >>>> > > > > > > > > > > >>>>> Hi Michael, > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> Unfortunately, I don't have too much experience on > > > Spark. > > > > > But > > > > > > > if > > > > > > > > > > spark > > > > > > > > > > >>> uses > > > > > > > > > > >>>>> the parquet-mr library in an embedded way (that's > how > > > > Hive > > > > > > uses > > > > > > > > it) > > > > > > > > > > it > > > > > > > > > > >>> is > > > > > > > > > > >>>>> required to re-build Spark with 1.11 RC parquet-mr. > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> Regards, > > > > > > > > > > >>>>> Gabor > > > > > > > > > > >>>>> > > > > > > > > > > >>>>> On Wed, Nov 20, 2019 at 5:44 PM Michael Heuer < > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > >>> wrote: > > > > > > > > > > >>>>> > > > > > > > > > > >>>>>> It appears a provided scope dependency on > spark-sql > > > > leaks > > > > > > old > > > > > > > > > > parquet > > > > > > > > > > >>>>>> versions was causing the runtime error below. > After > > > > > > including > > > > > > > > new > > > > > > > > > > >>>>>> parquet-column and parquet-hadoop compile scope > > > > > dependencies > > > > > > > (in > > > > > > > > > > >>> addition > > > > > > > > > > >>>>>> to parquet-avro, which we already have at compile > > > > scope), > > > > > > our > > > > > > > > > build > > > > > > > > > > >>>>>> succeeds. > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> https://github.com/bigdatagenomics/adam/pull/2232 > < > > > > > > > > > > >>>>>> https://github.com/bigdatagenomics/adam/pull/2232 > > > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> However, when running via spark-submit I run into > a > > > > > similar > > > > > > > > > runtime > > > > > > > > > > >>> error > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> Caused by: java.lang.NoSuchMethodError: > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder; > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:226) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:182) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:141) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:244) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:135) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:126) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35) > > > > > > > > > > >>>>>> at org.apache.spark.internal.io > > > > > > > > > > >>>>>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350) > > > > > > > > > > >>>>>> at org.apache.spark.internal.io > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120) > > > > > > > > > > >>>>>> at org.apache.spark.internal.io > > > > > > > > > > >>>>>> > > > > > > > .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) > > > > > > > > > > >>>>>> at org.apache.spark.internal.io > > > > > > > > > > >>>>>> > > > > > > > .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > > > > > > > > > > >>>>>> at > > > > > org.apache.spark.scheduler.Task.run(Task.scala:123) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > > > > > > > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > > > > > > > > > >>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > > > > > > > > > >>>>>> at java.lang.Thread.run(Thread.java:748) > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> Will bumping our library dependency version to > 1.11 > > > > > require > > > > > > a > > > > > > > > new > > > > > > > > > > >>> version > > > > > > > > > > >>>>>> of Spark, built against Parquet 1.11? > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> Please accept my apologies if this is heading > > > > out-of-scope > > > > > > for > > > > > > > > the > > > > > > > > > > >>>>> Parquet > > > > > > > > > > >>>>>> mailing list. > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> michael > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>>> On Nov 20, 2019, at 10:00 AM, Michael Heuer < > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > >>> wrote: > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>>> I am willing to do some benchmarking on genomic > > data > > > at > > > > > > scale > > > > > > > > but > > > > > > > > > > am > > > > > > > > > > >>>>> not > > > > > > > > > > >>>>>> quite sure what the Spark target version for > 1.11.0 > > > > might > > > > > > be. > > > > > > > > > Will > > > > > > > > > > >>>>> Parquet > > > > > > > > > > >>>>>> 1.11.0 be compatible in Spark 2.4.x? > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>>> Updating from 1.10.1 to 1.11.0 breaks at runtime > in > > > our > > > > > > build > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>>> … > > > > > > > > > > >>>>>>> D 0, localhost, executor driver): > > > > > > > > java.lang.NoClassDefFoundError: > > > > > > > > > > >>>>>> org/apache/parquet/schema/LogicalTypeAnnotation > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35) > > > > > > > > > > >>>>>>> at org.apache.spark.internal.io > > > > > > > > > > >>>>>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350) > > > > > > > > > > >>>>>>> at org.apache.spark.internal.io > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120) > > > > > > > > > > >>>>>>> at org.apache.spark.internal.io > > > > > > > > > > >>>>>> > > > > > > > .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) > > > > > > > > > > >>>>>>> at org.apache.spark.internal.io > > > > > > > > > > >>>>>> > > > > > > > .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > > > > > > > > > > >>>>>>> at > > > > > org.apache.spark.scheduler.Task.run(Task.scala:123) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > > > > > > > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > > > > > > > > > >>>>>>> at java.lang.Thread.run(Thread.java:748) > > > > > > > > > > >>>>>>> Caused by: java.lang.ClassNotFoundException: > > > > > > > > > > >>>>>> org.apache.parquet.schema.LogicalTypeAnnotation > > > > > > > > > > >>>>>>> at > > > > > > > > java.net.URLClassLoader.findClass(URLClassLoader.java:382) > > > > > > > > > > >>>>>>> at > > > > > > java.lang.ClassLoader.loadClass(ClassLoader.java:424) > > > > > > > > > > >>>>>>> at > > > > > > > > > > >> > > > > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > > > > > > > > > > >>>>>>> at > > > > > > java.lang.ClassLoader.loadClass(ClassLoader.java:357) > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>>> michael > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>>>> On Nov 20, 2019, at 3:29 AM, Gabor Szadovszky < > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > >>>>> wrote: > > > > > > > > > > >>>>>>>> > > > > > > > > > > >>>>>>>> Thanks, Fokko. > > > > > > > > > > >>>>>>>> > > > > > > > > > > >>>>>>>> Ryan, we did not do such measurements yet. I'm > > > > afraid, I > > > > > > > won't > > > > > > > > > > have > > > > > > > > > > >>>>>> enough > > > > > > > > > > >>>>>>>> time to do that in the next couple of weeks. > > > > > > > > > > >>>>>>>> > > > > > > > > > > >>>>>>>> Cheers, > > > > > > > > > > >>>>>>>> Gabor > > > > > > > > > > >>>>>>>> > > > > > > > > > > >>>>>>>> On Tue, Nov 19, 2019 at 6:14 PM Driesprong, > Fokko > > > > > > > > > > >>>>> <[email protected] > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>>>> wrote: > > > > > > > > > > >>>>>>>> > > > > > > > > > > >>>>>>>>> Thanks Gabor for the explanation. I'd like to > > > change > > > > my > > > > > > > vote > > > > > > > > to > > > > > > > > > > +1 > > > > > > > > > > >>>>>>>>> (non-binding). > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>>>>> Cheers, Fokko > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>>>>> Op di 19 nov. 2019 om 18:03 schreef Ryan Blue > > > > > > > > > > >>>>>> <[email protected]> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>>>>>> Gabor, what I meant was: have we tried this > with > > > > real > > > > > > data > > > > > > > > to > > > > > > > > > > see > > > > > > > > > > >>>>> the > > > > > > > > > > >>>>>>>>>> effect? I think those results would be > helpful. > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>>> On Mon, Nov 18, 2019 at 11:35 PM Gabor > > Szadovszky > > > < > > > > > > > > > > >>> [email protected] > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>>>>>> wrote: > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> Hi Ryan, > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> It is not easy to calculate. For the column > > > indexes > > > > > > > feature > > > > > > > > > we > > > > > > > > > > >>>>>>>>> introduced > > > > > > > > > > >>>>>>>>>>> two new structures saved before the footer: > > > column > > > > > > > indexes > > > > > > > > > and > > > > > > > > > > >>>>> offset > > > > > > > > > > >>>>>>>>>>> indexes. If the min/max values are not too > > long, > > > > then > > > > > > the > > > > > > > > > > >>>>> truncation > > > > > > > > > > >>>>>>>>>> might > > > > > > > > > > >>>>>>>>>>> not decrease the file size because of the > > offset > > > > > > indexes. > > > > > > > > > > >>> Moreover, > > > > > > > > > > >>>>>> we > > > > > > > > > > >>>>>>>>>> also > > > > > > > > > > >>>>>>>>>>> introduced parquet.page.row.count.limit which > > > might > > > > > > > > increase > > > > > > > > > > the > > > > > > > > > > >>>>>> number > > > > > > > > > > >>>>>>>>>> of > > > > > > > > > > >>>>>>>>>>> pages which leads to increasing the file > size. > > > > > > > > > > >>>>>>>>>>> The footer itself is also changed and we are > > > saving > > > > > > more > > > > > > > > > values > > > > > > > > > > >> in > > > > > > > > > > >>>>>> it: > > > > > > > > > > >>>>>>>>>> the > > > > > > > > > > >>>>>>>>>>> offset values to the column/offset indexes, > the > > > new > > > > > > > logical > > > > > > > > > > type > > > > > > > > > > >>>>>>>>>>> structures, the CRC checksums (we might have > > some > > > > > > > others). > > > > > > > > > > >>>>>>>>>>> So, the size of the files with small amount > of > > > data > > > > > > will > > > > > > > be > > > > > > > > > > >>>>> increased > > > > > > > > > > >>>>>>>>>>> (because of the larger footer). The size of > the > > > > files > > > > > > > where > > > > > > > > > the > > > > > > > > > > >>>>>> values > > > > > > > > > > >>>>>>>>>> can > > > > > > > > > > >>>>>>>>>>> be encoded very well (RLE) will probably be > > > > increased > > > > > > > > > (because > > > > > > > > > > >> we > > > > > > > > > > >>>>>> will > > > > > > > > > > >>>>>>>>>> have > > > > > > > > > > >>>>>>>>>>> more pages). The size of some files where the > > > > values > > > > > > are > > > > > > > > long > > > > > > > > > > >>>>>> (>64bytes > > > > > > > > > > >>>>>>>>>> by > > > > > > > > > > >>>>>>>>>>> default) might be decreased because of > > truncating > > > > the > > > > > > > > min/max > > > > > > > > > > >>>>> values. > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> Regards, > > > > > > > > > > >>>>>>>>>>> Gabor > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue > > > > > > > > > > >>>>> <[email protected] > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>>>>>>> wrote: > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> Gabor, do we have an idea of the additional > > > > overhead > > > > > > > for a > > > > > > > > > > >>>>> non-test > > > > > > > > > > >>>>>>>>>> data > > > > > > > > > > >>>>>>>>>>>> file? It should be easy to validate that > this > > > > > doesn't > > > > > > > > > > introduce > > > > > > > > > > >>> an > > > > > > > > > > >>>>>>>>>>>> unreasonable amount of overhead. In some > > cases, > > > it > > > > > > > should > > > > > > > > > > >>> actually > > > > > > > > > > >>>>>> be > > > > > > > > > > >>>>>>>>>>>> smaller since the column indexes are > truncated > > > and > > > > > > page > > > > > > > > > stats > > > > > > > > > > >> are > > > > > > > > > > >>>>>>>>> not. > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> On Mon, Nov 18, 2019 at 1:00 AM Gabor > > Szadovszky > > > > > > > > > > >>>>>>>>>>>> <[email protected]> > > wrote: > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> Hi Fokko, > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> For the first point. The referenced > > constructor > > > > is > > > > > > > > private > > > > > > > > > > and > > > > > > > > > > >>>>>>>>>> Iceberg > > > > > > > > > > >>>>>>>>>>>> uses > > > > > > > > > > >>>>>>>>>>>>> it via reflection. It is not a breaking > > > change. I > > > > > > > think, > > > > > > > > > > >>>>> parquet-mr > > > > > > > > > > >>>>>>>>>>> shall > > > > > > > > > > >>>>>>>>>>>>> not keep private methods only because of > > > clients > > > > > > might > > > > > > > > use > > > > > > > > > > >> them > > > > > > > > > > >>>>> via > > > > > > > > > > >>>>>>>>>>>>> reflection. > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> About the checksum. I've agreed on having > the > > > CRC > > > > > > > > checksum > > > > > > > > > > >> write > > > > > > > > > > >>>>>>>>>>> enabled > > > > > > > > > > >>>>>>>>>>>> by > > > > > > > > > > >>>>>>>>>>>>> default because the benchmarks did not show > > > > > > significant > > > > > > > > > > >>>>> performance > > > > > > > > > > >>>>>>>>>>>>> penalties. See > > > > > > > > > https://github.com/apache/parquet-mr/pull/647 > > > > > > > > > > >>> for > > > > > > > > > > >>>>>>>>>>>> details. > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> About the file size change. 1.11.0 is > > > introducing > > > > > > > column > > > > > > > > > > >>> indexes, > > > > > > > > > > >>>>>>>>> CRC > > > > > > > > > > >>>>>>>>>>>>> checksum, removing the statistics from the > > page > > > > > > headers > > > > > > > > and > > > > > > > > > > >>> maybe > > > > > > > > > > >>>>>>>>>> other > > > > > > > > > > >>>>>>>>>>>>> changes that impact file size. If only file > > > size > > > > is > > > > > > in > > > > > > > > > > >> question > > > > > > > > > > >>> I > > > > > > > > > > >>>>>>>>>>> cannot > > > > > > > > > > >>>>>>>>>>>>> see a breaking change here. > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> Regards, > > > > > > > > > > >>>>>>>>>>>>> Gabor > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> On Sun, Nov 17, 2019 at 9:27 PM Driesprong, > > > Fokko > > > > > > > > > > >>>>>>>>>> <[email protected] > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> wrote: > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> Unfortunately, a -1 from my side > > (non-binding) > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> I've updated Iceberg to Parquet 1.11.0, > and > > > > found > > > > > > > three > > > > > > > > > > >> things: > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> - We've broken backward compatibility of > the > > > > > > > constructor > > > > > > > > > of > > > > > > > > > > >>>>>>>>>>>>>> ColumnChunkPageWriteStore > > > > > > > > > > >>>>>>>>>>>>>> < > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80 > > > > > > > > > > >>>>>>>>>>>>>>> . > > > > > > > > > > >>>>>>>>>>>>>> This required a change > > > > > > > > > > >>>>>>>>>>>>>> < > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176 > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> to the code. This isn't a hard blocker, > but > > if > > > > > there > > > > > > > > will > > > > > > > > > be > > > > > > > > > > >> a > > > > > > > > > > >>>>>>>>>> new > > > > > > > > > > >>>>>>>>>>>> RC, > > > > > > > > > > >>>>>>>>>>>>>> I've > > > > > > > > > > >>>>>>>>>>>>>> submitted a patch: > > > > > > > > > > >>>>>>>>>> https://github.com/apache/parquet-mr/pull/699 > > > > > > > > > > >>>>>>>>>>>>>> - Related, that we need to put in the > > > changelog, > > > > > is > > > > > > > that > > > > > > > > > > >>>>>>>>>> checksums > > > > > > > > > > >>>>>>>>>>>> are > > > > > > > > > > >>>>>>>>>>>>>> enabled by default: > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54 > > > > > > > > > > >>>>>>>>>>>>>> This > > > > > > > > > > >>>>>>>>>>>>>> will impact performance. I would suggest > > > > disabling > > > > > > it > > > > > > > by > > > > > > > > > > >>>>>>>>>> default: > > > > > > > > > > >>>>>>>>>>>>>> > > https://github.com/apache/parquet-mr/pull/700 > > > > > > > > > > >>>>>>>>>>>>>> < > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277 > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> - Binary compatibility. While updating > > > Iceberg, > > > > > I've > > > > > > > > > noticed > > > > > > > > > > >>>>>>>>>> that > > > > > > > > > > >>>>>>>>>>>> the > > > > > > > > > > >>>>>>>>>>>>>> split-test was failing: > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199 > > > > > > > > > > >>>>>>>>>>>>>> The > > > > > > > > > > >>>>>>>>>>>>>> two records are now divided over four > Spark > > > > > > > partitions. > > > > > > > > > > >>>>>>>>>> Something > > > > > > > > > > >>>>>>>>>>> in > > > > > > > > > > >>>>>>>>>>>>> the > > > > > > > > > > >>>>>>>>>>>>>> output has changed since the files are > > bigger > > > > now. > > > > > > Has > > > > > > > > > > anyone > > > > > > > > > > >>>>>>>>>> any > > > > > > > > > > >>>>>>>>>>>> idea > > > > > > > > > > >>>>>>>>>>>>>> to > > > > > > > > > > >>>>>>>>>>>>>> check what's changed, or a way to check > > this? > > > > The > > > > > > only > > > > > > > > > thing > > > > > > > > > > >> I > > > > > > > > > > >>>>>>>>>> can > > > > > > > > > > >>>>>>>>>>>>>> think of > > > > > > > > > > >>>>>>>>>>>>>> is the checksum mentioned above. > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> $ ls -lah ~/Desktop/parquet-1-1* > > > > > > > > > > >>>>>>>>>>>>>> -rw-r--r-- 1 fokkodriesprong staff > 562B > > 17 > > > > nov > > > > > > > 21:09 > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet > > > > > > > > > > >>>>>>>>>>>>>> -rw-r--r-- 1 fokkodriesprong staff > 611B > > 17 > > > > nov > > > > > > > 21:05 > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> $ parquet-tools cat > > > > > > > > > > >>>>>>>>>>>> > > > > > /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet > > > > > > > > > > >>>>>>>>>>>>>> id = 1 > > > > > > > > > > >>>>>>>>>>>>>> data = a > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> $ parquet-tools cat > > > > > > > > > > >>>>>>>>>>>> > > > > > /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet > > > > > > > > > > >>>>>>>>>>>>>> id = 1 > > > > > > > > > > >>>>>>>>>>>>>> data = a > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> A binary diff here: > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >> > > > > > https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8 > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> Cheers, Fokko > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> Op za 16 nov. 2019 om 04:18 schreef Junjie > > > Chen > > > > < > > > > > > > > > > >>>>>>>>>>>>> [email protected] > > > > > > > > > > >>>>>>>>>>>>>>> : > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>> +1 > > > > > > > > > > >>>>>>>>>>>>>>> Verified signature, checksum and ran mvn > > > > install > > > > > > > > > > >> successfully. > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>> Wang, Yuming <[email protected]> > > > > > > > 于2019年11月14日周四 > > > > > > > > > > >>>>>>>>> 下午2:05写道: > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> +1 > > > > > > > > > > >>>>>>>>>>>>>>>> Tested Parquet 1.11.0 with Spark SQL > > module: > > > > > > > build/sbt > > > > > > > > > > >>>>>>>>>>>>> "sql/test-only" > > > > > > > > > > >>>>>>>>>>>>>>> -Phadoop-3.2 > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> On 2019/11/13, 21:33, "Gabor > Szadovszky" > > < > > > > > > > > > > >> [email protected]> > > > > > > > > > > >>>>>>>>>>>> wrote: > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> Hi everyone, > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> I propose the following RC to be > released > > as > > > > > > > official > > > > > > > > > > >>>>>>>>>> Apache > > > > > > > > > > >>>>>>>>>>>>>> Parquet > > > > > > > > > > >>>>>>>>>>>>>>> 1.11.0 > > > > > > > > > > >>>>>>>>>>>>>>>> release. > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> The commit id is > > > > > > > > > 18519eb8e059865652eee3ff0e8593f126701da4 > > > > > > > > > > >>>>>>>>>>>>>>>> * This corresponds to the tag: > > > > > > > > apache-parquet-1.11.0-rc7 > > > > > > > > > > >>>>>>>>>>>>>>>> * > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&reserved=0 > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> The release tarball, signature, and > > > checksums > > > > > are > > > > > > > > here: > > > > > > > > > > >>>>>>>>>>>>>>>> * > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&reserved=0 > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> You can find the KEYS file here: > > > > > > > > > > >>>>>>>>>>>>>>>> * > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&reserved=0 > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> Binary artifacts are staged in Nexus > here: > > > > > > > > > > >>>>>>>>>>>>>>>> * > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&reserved=0 > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> This release includes the changes listed > > at: > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&reserved=0 > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> Please download, verify, and test. > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> Please vote in the next 72 hours. > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> [ ] +1 Release this as Apache Parquet > > 1.11.0 > > > > > > > > > > >>>>>>>>>>>>>>>> [ ] +0 > > > > > > > > > > >>>>>>>>>>>>>>>> [ ] -1 Do not release this because... > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>>> -- > > > > > > > > > > >>>>>>>>>>>> Ryan Blue > > > > > > > > > > >>>>>>>>>>>> Software Engineer > > > > > > > > > > >>>>>>>>>>>> Netflix > > > > > > > > > > >>>>>>>>>>>> > > > > > > > > > > >>>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>>> -- > > > > > > > > > > >>>>>>>>>> Ryan Blue > > > > > > > > > > >>>>>>>>>> Software Engineer > > > > > > > > > > >>>>>>>>>> Netflix > > > > > > > > > > >>>>>>>>>> > > > > > > > > > > >>>>>>>>> > > > > > > > > > > >>>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>>> > > > > > > > > > > >>>>> > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > > >>>> -- > > > > > > > > > > >>>> Ryan Blue > > > > > > > > > > >>>> Software Engineer > > > > > > > > > > >>>> Netflix > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Ryan Blue > > > > > > > > > > > Software Engineer > > > > > > > > > > > Netflix > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Ryan Blue > > > > > > > > Software Engineer > > > > > > > > Netflix > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Ryan Blue > > > > > > Software Engineer > > > > > > Netflix > > > > > > > > > > > > > > > > > > > > > -- Ryan Blue Software Engineer Netflix
