Clirr fails the binary incompatibility check against 1.10.1 parquet-mr (HEAD detached at apache-parquet-1.11.0-rc7) $ mvn clirr:check -DcomparisonArtifacts=1.10.1 … [INFO] --- clirr-maven-plugin:2.6.1:check (default-cli) @ parquet-common --- [INFO] artifact org.apache.parquet:parquet-common: checking for updates from jitpack.io [INFO] artifact org.apache.parquet:parquet-common: checking for updates from central [INFO] Comparing to version: 1.10.1 [ERROR] 7009: org.apache.parquet.bytes.ByteBufferInputStream: Accessibility of method 'public ByteBufferInputStream()' has been decreased from public to package [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Apache Parquet MR 1.11.0: [INFO] [INFO] Apache Parquet MR .................................. SUCCESS [ 2.052 s] [INFO] Apache Parquet Format Structures ................... SUCCESS [ 7.035 s] [INFO] Apache Parquet Generator ........................... SUCCESS [ 1.872 s] [INFO] Apache Parquet Common .............................. FAILURE [ 1.478 s] ...
> On Nov 22, 2019, at 2:23 AM, Gabor Szadovszky <[email protected]> wrote: > > Ryan, > I would not trust our compatibility checks (semver) too much. Currently, it > is configured to compare our current version to 1.7.0. It means anything > that is added since 1.7.0 and then broke in a later release won't be > caught. In addition, many packages are excluded from the check because of > different reasons. For example org/apache/parquet/schema/** is excluded so > if it would really be an API compatibility issue we certainly wouldn't > catch it. > > Michael, > It fails because of a NoSuchMethodError pointing to a method that is newly > introduced in 1.11. Both the caller and the callee shipped by parquet-mr. > So, I'm quite sure it is a classpath issue. It seems that the 1.11 version > of the parquet-column jar is not on the classpath. > > > On Fri, Nov 22, 2019 at 1:44 AM Michael Heuer <[email protected]> wrote: > >> The dependency versions are consistent in our artifact >> >> $ mvn dependency:tree | grep parquet >> [INFO] | \- org.apache.parquet:parquet-avro:jar:1.11.0:compile >> [INFO] | \- >> org.apache.parquet:parquet-format-structures:jar:1.11.0:compile >> [INFO] | +- org.apache.parquet:parquet-column:jar:1.11.0:compile >> [INFO] | | +- org.apache.parquet:parquet-common:jar:1.11.0:compile >> [INFO] | | \- org.apache.parquet:parquet-encoding:jar:1.11.0:compile >> [INFO] | +- org.apache.parquet:parquet-hadoop:jar:1.11.0:compile >> [INFO] | | +- org.apache.parquet:parquet-jackson:jar:1.11.0:compile >> >> The latter error >> >> Caused by: org.apache.spark.SparkException: Job aborted due to stage >> failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task >> 0.0 in stage 0.0 (TID 0, localhost, executor driver): >> java.lang.NoSuchMethodError: >> org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder; >> at >> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161) >> >> occurs when I attempt to run via spark-submit on Spark 2.4.4 >> >> $ spark-submit --version >> Welcome to >> ____ __ >> / __/__ ___ _____/ /__ >> _\ \/ _ \/ _ `/ __/ '_/ >> /___/ .__/\_,_/_/ /_/\_\ version 2.4.4 >> /_/ >> >> Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_191 >> Branch >> Compiled by user on 2019-08-27T21:21:38Z >> Revision >> Url >> Type --help for more information. >> >> >> >>> On Nov 21, 2019, at 6:06 PM, Ryan Blue <[email protected]> >> wrote: >>> >>> Thanks for looking into it, Nandor. That doesn't sound like a problem >> with >>> Parquet, but a problem with the test environment since parquet-avro >> depends >>> on a newer API method. >>> >>> On Thu, Nov 21, 2019 at 3:58 PM Nandor Kollar >> <[email protected]> >>> wrote: >>> >>>> I'm not sure that this is a binary compatibility issue. The missing >> builder >>>> method was recently added in 1.11.0 with the introduction of the new >>>> logical type API, while the original version (one with a single >>>> OriginalType input parameter called before by AvroSchemaConverter) of >> this >>>> method is kept untouched. It seems to me that the Parquet version on >> Spark >>>> executor mismatch: parquet-avro is on 1.11.0, but parquet-column is >> still >>>> on an older version. >>>> >>>> On Thu, Nov 21, 2019 at 11:41 PM Michael Heuer <[email protected]> >> wrote: >>>> >>>>> Perhaps not strictly necessary to say, but if this particular >>>>> compatibility break between 1.10 and 1.11 was intentional, and no other >>>>> compatibility breaks are found, I would vote -1 (non-binding) on this >> RC >>>>> such that we might go back and revisit the changes to preserve >>>>> compatibility. >>>>> >>>>> I am not sure there is presently enough motivation in the Spark project >>>>> for a release after 2.4.4 and before 3.0 in which to bump the Parquet >>>>> dependency version to 1.11.x. >>>>> >>>>> michael >>>>> >>>>> >>>>>> On Nov 21, 2019, at 11:01 AM, Ryan Blue <[email protected]> >>>>> wrote: >>>>>> >>>>>> Gabor, shouldn't Parquet be binary compatible for public APIs? From >> the >>>>>> stack trace, it looks like this 1.11.0 RC breaks binary compatibility >>>> in >>>>>> the type builders. >>>>>> >>>>>> Looks like this should have been caught by the binary compatibility >>>>> checks. >>>>>> >>>>>> On Thu, Nov 21, 2019 at 8:56 AM Gabor Szadovszky <[email protected]> >>>>> wrote: >>>>>> >>>>>>> Hi Michael, >>>>>>> >>>>>>> Unfortunately, I don't have too much experience on Spark. But if >> spark >>>>> uses >>>>>>> the parquet-mr library in an embedded way (that's how Hive uses it) >> it >>>>> is >>>>>>> required to re-build Spark with 1.11 RC parquet-mr. >>>>>>> >>>>>>> Regards, >>>>>>> Gabor >>>>>>> >>>>>>> On Wed, Nov 20, 2019 at 5:44 PM Michael Heuer <[email protected]> >>>>> wrote: >>>>>>> >>>>>>>> It appears a provided scope dependency on spark-sql leaks old >> parquet >>>>>>>> versions was causing the runtime error below. After including new >>>>>>>> parquet-column and parquet-hadoop compile scope dependencies (in >>>>> addition >>>>>>>> to parquet-avro, which we already have at compile scope), our build >>>>>>>> succeeds. >>>>>>>> >>>>>>>> https://github.com/bigdatagenomics/adam/pull/2232 < >>>>>>>> https://github.com/bigdatagenomics/adam/pull/2232> >>>>>>>> >>>>>>>> However, when running via spark-submit I run into a similar runtime >>>>> error >>>>>>>> >>>>>>>> Caused by: java.lang.NoSuchMethodError: >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder; >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:226) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:182) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:141) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:244) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:135) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:126) >>>>>>>> at >>>>>>>> >>>>> >> org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35) >>>>>>>> at org.apache.spark.internal.io >>>>>>>> >>>> .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350) >>>>>>>> at org.apache.spark.internal.io >>>>>>>> >>>>>>> >>>>> >>>> >> .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120) >>>>>>>> at org.apache.spark.internal.io >>>>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) >>>>>>>> at org.apache.spark.internal.io >>>>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) >>>>>>>> at >>>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) >>>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:123) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) >>>>>>>> at >>>>>>>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) >>>>>>>> at >>>>>>>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>>>>>> at java.lang.Thread.run(Thread.java:748) >>>>>>>> >>>>>>>> >>>>>>>> Will bumping our library dependency version to 1.11 require a new >>>>> version >>>>>>>> of Spark, built against Parquet 1.11? >>>>>>>> >>>>>>>> Please accept my apologies if this is heading out-of-scope for the >>>>>>> Parquet >>>>>>>> mailing list. >>>>>>>> >>>>>>>> michael >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 20, 2019, at 10:00 AM, Michael Heuer <[email protected]> >>>>> wrote: >>>>>>>>> >>>>>>>>> I am willing to do some benchmarking on genomic data at scale but >> am >>>>>>> not >>>>>>>> quite sure what the Spark target version for 1.11.0 might be. Will >>>>>>> Parquet >>>>>>>> 1.11.0 be compatible in Spark 2.4.x? >>>>>>>>> >>>>>>>>> Updating from 1.10.1 to 1.11.0 breaks at runtime in our build >>>>>>>>> >>>>>>>>> … >>>>>>>>> D 0, localhost, executor driver): java.lang.NoClassDefFoundError: >>>>>>>> org/apache/parquet/schema/LogicalTypeAnnotation >>>>>>>>> at >>>>>>>> >>>>> >> org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121) >>>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388) >>>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) >>>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35) >>>>>>>>> at org.apache.spark.internal.io >>>>>>>> >>>> .HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350) >>>>>>>>> at org.apache.spark.internal.io >>>>>>>> >>>>>>> >>>>> >>>> >> .SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120) >>>>>>>>> at org.apache.spark.internal.io >>>>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) >>>>>>>>> at org.apache.spark.internal.io >>>>>>>> .SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) >>>>>>>>> at >>>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) >>>>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:123) >>>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) >>>>>>>>> at >>>>>>>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) >>>>>>>>> at >>>>>>>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) >>>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>>>>>>> at >>>>>>>> >>>>>>> >>>>> >>>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>>>>>>> at java.lang.Thread.run(Thread.java:748) >>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>> org.apache.parquet.schema.LogicalTypeAnnotation >>>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:382) >>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>>>> at >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) >>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>>>> >>>>>>>>> michael >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Nov 20, 2019, at 3:29 AM, Gabor Szadovszky <[email protected]> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Thanks, Fokko. >>>>>>>>>> >>>>>>>>>> Ryan, we did not do such measurements yet. I'm afraid, I won't >> have >>>>>>>> enough >>>>>>>>>> time to do that in the next couple of weeks. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Gabor >>>>>>>>>> >>>>>>>>>> On Tue, Nov 19, 2019 at 6:14 PM Driesprong, Fokko >>>>>>> <[email protected] >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Gabor for the explanation. I'd like to change my vote to >> +1 >>>>>>>>>>> (non-binding). >>>>>>>>>>> >>>>>>>>>>> Cheers, Fokko >>>>>>>>>>> >>>>>>>>>>> Op di 19 nov. 2019 om 18:03 schreef Ryan Blue >>>>>>>> <[email protected]> >>>>>>>>>>> >>>>>>>>>>>> Gabor, what I meant was: have we tried this with real data to >> see >>>>>>> the >>>>>>>>>>>> effect? I think those results would be helpful. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Nov 18, 2019 at 11:35 PM Gabor Szadovszky < >>>>> [email protected] >>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Ryan, >>>>>>>>>>>>> >>>>>>>>>>>>> It is not easy to calculate. For the column indexes feature we >>>>>>>>>>> introduced >>>>>>>>>>>>> two new structures saved before the footer: column indexes and >>>>>>> offset >>>>>>>>>>>>> indexes. If the min/max values are not too long, then the >>>>>>> truncation >>>>>>>>>>>> might >>>>>>>>>>>>> not decrease the file size because of the offset indexes. >>>>> Moreover, >>>>>>>> we >>>>>>>>>>>> also >>>>>>>>>>>>> introduced parquet.page.row.count.limit which might increase >> the >>>>>>>> number >>>>>>>>>>>> of >>>>>>>>>>>>> pages which leads to increasing the file size. >>>>>>>>>>>>> The footer itself is also changed and we are saving more values >>>> in >>>>>>>> it: >>>>>>>>>>>> the >>>>>>>>>>>>> offset values to the column/offset indexes, the new logical >> type >>>>>>>>>>>>> structures, the CRC checksums (we might have some others). >>>>>>>>>>>>> So, the size of the files with small amount of data will be >>>>>>> increased >>>>>>>>>>>>> (because of the larger footer). The size of the files where the >>>>>>>> values >>>>>>>>>>>> can >>>>>>>>>>>>> be encoded very well (RLE) will probably be increased (because >>>> we >>>>>>>> will >>>>>>>>>>>> have >>>>>>>>>>>>> more pages). The size of some files where the values are long >>>>>>>> (>64bytes >>>>>>>>>>>> by >>>>>>>>>>>>> default) might be decreased because of truncating the min/max >>>>>>> values. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Gabor >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue >>>>>>> <[email protected] >>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Gabor, do we have an idea of the additional overhead for a >>>>>>> non-test >>>>>>>>>>>> data >>>>>>>>>>>>>> file? It should be easy to validate that this doesn't >> introduce >>>>> an >>>>>>>>>>>>>> unreasonable amount of overhead. In some cases, it should >>>>> actually >>>>>>>> be >>>>>>>>>>>>>> smaller since the column indexes are truncated and page stats >>>> are >>>>>>>>>>> not. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Nov 18, 2019 at 1:00 AM Gabor Szadovszky >>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Fokko, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For the first point. The referenced constructor is private >> and >>>>>>>>>>>> Iceberg >>>>>>>>>>>>>> uses >>>>>>>>>>>>>>> it via reflection. It is not a breaking change. I think, >>>>>>> parquet-mr >>>>>>>>>>>>> shall >>>>>>>>>>>>>>> not keep private methods only because of clients might use >>>> them >>>>>>> via >>>>>>>>>>>>>>> reflection. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> About the checksum. I've agreed on having the CRC checksum >>>> write >>>>>>>>>>>>> enabled >>>>>>>>>>>>>> by >>>>>>>>>>>>>>> default because the benchmarks did not show significant >>>>>>> performance >>>>>>>>>>>>>>> penalties. See https://github.com/apache/parquet-mr/pull/647 >>>>> for >>>>>>>>>>>>>> details. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> About the file size change. 1.11.0 is introducing column >>>>> indexes, >>>>>>>>>>> CRC >>>>>>>>>>>>>>> checksum, removing the statistics from the page headers and >>>>> maybe >>>>>>>>>>>> other >>>>>>>>>>>>>>> changes that impact file size. If only file size is in >>>> question >>>>> I >>>>>>>>>>>>> cannot >>>>>>>>>>>>>>> see a breaking change here. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Gabor >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Nov 17, 2019 at 9:27 PM Driesprong, Fokko >>>>>>>>>>>> <[email protected] >>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Unfortunately, a -1 from my side (non-binding) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've updated Iceberg to Parquet 1.11.0, and found three >>>> things: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - We've broken backward compatibility of the constructor of >>>>>>>>>>>>>>>> ColumnChunkPageWriteStore >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR80 >>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>> This required a change >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://github.com/apache/incubator-iceberg/pull/297/files#diff-b877faa96f292b851c75fe8bcc1912f8R176 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> to the code. This isn't a hard blocker, but if there will be >>>> a >>>>>>>>>>>> new >>>>>>>>>>>>>> RC, >>>>>>>>>>>>>>>> I've >>>>>>>>>>>>>>>> submitted a patch: >>>>>>>>>>>> https://github.com/apache/parquet-mr/pull/699 >>>>>>>>>>>>>>>> - Related, that we need to put in the changelog, is that >>>>>>>>>>>> checksums >>>>>>>>>>>>>> are >>>>>>>>>>>>>>>> enabled by default: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L54 >>>>>>>>>>>>>>>> This >>>>>>>>>>>>>>>> will impact performance. I would suggest disabling it by >>>>>>>>>>>> default: >>>>>>>>>>>>>>>> https://github.com/apache/parquet-mr/pull/700 >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e#diff-d007a18083a2431c30a5416f248e0a4bR277 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Binary compatibility. While updating Iceberg, I've noticed >>>>>>>>>>>> that >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> split-test was failing: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://github.com/apache/incubator-iceberg/pull/297/files#diff-4b64b7014f259be41b26cfb73d3e6e93L199 >>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>> two records are now divided over four Spark partitions. >>>>>>>>>>>> Something >>>>>>>>>>>>> in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> output has changed since the files are bigger now. Has >> anyone >>>>>>>>>>>> any >>>>>>>>>>>>>> idea >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> check what's changed, or a way to check this? The only thing >>>> I >>>>>>>>>>>> can >>>>>>>>>>>>>>>> think of >>>>>>>>>>>>>>>> is the checksum mentioned above. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> $ ls -lah ~/Desktop/parquet-1-1* >>>>>>>>>>>>>>>> -rw-r--r-- 1 fokkodriesprong staff 562B 17 nov 21:09 >>>>>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet >>>>>>>>>>>>>>>> -rw-r--r-- 1 fokkodriesprong staff 611B 17 nov 21:05 >>>>>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> $ parquet-tools cat >>>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-10-1.parquet >>>>>>>>>>>>>>>> id = 1 >>>>>>>>>>>>>>>> data = a >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> $ parquet-tools cat >>>>>>>>>>>>>> /Users/fokkodriesprong/Desktop/parquet-1-11-0.parquet >>>>>>>>>>>>>>>> id = 1 >>>>>>>>>>>>>>>> data = a >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> A binary diff here: >>>>>>>>>>>>>>>> >>>> https://gist.github.com/Fokko/1c209f158299dc2fb5878c5bae4bf6d8 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, Fokko >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Op za 16 nov. 2019 om 04:18 schreef Junjie Chen < >>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>> Verified signature, checksum and ran mvn install >>>> successfully. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Wang, Yuming <[email protected]> 于2019年11月14日周四 >>>>>>>>>>> 下午2:05写道: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>>> Tested Parquet 1.11.0 with Spark SQL module: build/sbt >>>>>>>>>>>>>>> "sql/test-only" >>>>>>>>>>>>>>>>> -Phadoop-3.2 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 2019/11/13, 21:33, "Gabor Szadovszky" < >>>> [email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I propose the following RC to be released as official >>>>>>>>>>>> Apache >>>>>>>>>>>>>>>> Parquet >>>>>>>>>>>>>>>>> 1.11.0 >>>>>>>>>>>>>>>>>> release. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The commit id is 18519eb8e059865652eee3ff0e8593f126701da4 >>>>>>>>>>>>>>>>>> * This corresponds to the tag: apache-parquet-1.11.0-rc7 >>>>>>>>>>>>>>>>>> * >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Ftree%2F18519eb8e059865652eee3ff0e8593f126701da4&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=ToLFrTB9lU%2FGzH6UpXwy7PAY7kaupbyKAgdghESCfgg%3D&reserved=0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The release tarball, signature, and checksums are here: >>>>>>>>>>>>>>>>>> * >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fparquet%2Fapache-parquet-1.11.0-rc7&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=MPaHiYJT7ZcqreAYUkvDvZugthUhRPrySdXpN2ytT5k%3D&reserved=0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You can find the KEYS file here: >>>>>>>>>>>>>>>>>> * >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache.org%2Fdist%2Fparquet%2FKEYS&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=IwG4MUGsP2lVzlD4bwZUEPuEAPUg%2FHXRYtxc5CQupBM%3D&reserved=0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Binary artifacts are staged in Nexus here: >>>>>>>>>>>>>>>>>> * >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2F&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3D&reserved=0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This release includes the changes listed at: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.md&data=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267&sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3D&reserved=0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please download, verify, and test. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please vote in the next 72 hours. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [ ] +1 Release this as Apache Parquet 1.11.0 >>>>>>>>>>>>>>>>>> [ ] +0 >>>>>>>>>>>>>>>>>> [ ] -1 Do not release this because... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>> Software Engineer >>>>>>>>>>>>>> Netflix >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ryan Blue >>>>>>>>>>>> Software Engineer >>>>>>>>>>>> Netflix >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Software Engineer >>>>>> Netflix >>>>> >>>>> >>>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >> >>
