+1 for trimming the schema down and iterating On Thu, Dec 19, 2019 at 10:07 PM Kabeer Ahmed <kab...@linuxmail.org> wrote:
> Hi Nishith, > > I do not want to diverge this thread. I looked into the jira link that you > have sent which is - ( > https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1656?filter=allopenissues > ( > https://link.getmailspring.com/link/f14acfa3-1f88-4290-94f9-e4c3ed928...@getmailspring.com/0?redirect=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FPARQUET%2Fissues%2FPARQUET-1656%3Ffilter%3Dallopenissues&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)). > In the example of this jira, I see that the schema has changed especially > wrt addition of a default value. This has never worked for me in the past > and the rules of avro schema evolution are as below (I cant put an > authoritative link but these are derived from experience): > Always provide a default value for the field to start with. > > Never change a field's data type. > > If a new few field is added in the future, always provide a default value. > (back to rule 1). > > Never rename an existing field but instead use aliases. > > In the example given in the jira, we see that there is a default value > being added later when it was missed in the first step. I have highlighted > one field in the ps section of the email below. This has not worked for me > or atleast I can say that this has given issues in the future. > > Coming back to this thread, I think the issue Ethan is having is not (or > should not) be directly related to the jira. Ethan seems to be using delta > streamer from a source but encountering issues in the 2nd load whilst the > 1st one goes without issues. So he is not necessarily changing the > definition of the schemas manually. > If you have other thoughts, please feel free. We can discuss in another > thread. > Ethan - please see if you can replicate the issue with a subset of the > schema. This involves effort but from past experience, I can always vouch > for this practice as it enables others to help you faster or atleast we > understand, file and fix a bug faster. > Thanks, > Kabeer. > > ____________________________ > First version of schema: > > { > "default": null, > "name": "master_cluster", > "type": [ > "null", > { > "fields": [ > { "name": "uuid", "type": "string" } > , > { "name": "namespace", "type": "string" } > , > > { "name": "version", "type": "long" } > ], > "name": "master_cluster", > "type": "record" > } > ] > }, > > ___________________________________ > Second version of the schema: > > { > "default": null, > "name": "master_cluster", > "type": [ > "null", > { > "fields": [ > { "default": null, "name": "uuid", "type": [ "null", "string" ] } > , > { "default": null, "name": "namespace", "type": [ "null", "string" ] } > , > > { "default": null, "name": "version", "type": [ "null", "long" ] } > ], > "name": "VORGmaster_cluster", > "type": "record" > } > ] > }, > > On Dec 19 2019, at 10:59 pm, Y Ethan Guo <ethan.guoyi...@gmail.com> wrote: > > Got it. Thanks for the clarification. > > > > On Thu, Dec 19, 2019 at 2:54 PM nishith agarwal <n3.nas...@gmail.com> > wrote: > > > Ethan, > > > There isn't one available in the open-source, it's an internal build we > > > have. > > > > > > -Nishith > > > On Thu, Dec 19, 2019 at 2:50 PM Y Ethan Guo <ethan.guoyi...@gmail.com> > > > wrote: > > > > > > > Thanks Kabeer and Nishith for the responses. > > > > The schema hasn't been changed. I'm now trying to reproduce the > problem > > > > locally with some synthetic data, given that it's expensive to > iterate in > > > > my cluster testing setup. > > > > > > > > @Nishith, thanks for the pointer. Is there an existing build of > > > > parquet-avro v1.8.1 with the fix? I don't see the fix attached to the > > > > ticket. I suppose that I also need to rebuild Hudi utilities bundle > to > > > > pick that up. > > > > > > > > On Thu, Dec 19, 2019 at 1:51 PM nishith agarwal <n3.nas...@gmail.com > > > > > > wrote: > > > > > > > > > Ethan, > > > > > Unless this is a backwards incompatible schema change, this seems > > > > > related to a parquet-avro reader bug we've seen before, find more > > > > > > > > > > details > > > > > here : > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1656?filter=allopenissues > > > > > . > > > > > There's a fix for the parquet-avro reader for 1.8.1 which we > patched > > > > > > > > > > and > > > > > works for us in production. There isn't a proper fix available for > > > > > > > > > > later > > > > > parquet versions since the code has changed quite a bit. Which is > also > > > > > > > > the > > > > > reason why the patch for 1.8.1 is not being upstreamed since fewer > > > > > > > > > > people > > > > > are using it. > > > > > > > > > > -Nishith > > > > > On Thu, Dec 19, 2019 at 12:34 PM Kabeer Ahmed < > kab...@linuxmail.org> > > > > > wrote: > > > > > > > > > > > Hi Ethan, > > > > > > It is often tricky to debug or help with issues when I do not > have an > > > > > idea > > > > > > of the data. My "guess" is that your schema is changing. This > could > > > > > > > > > > > > > > > be > > > > > > related to: https://stackoverflow.com/a/42946528/4517001 ( > > > > > > > > > > > > > > > > > > > https://link.getmailspring.com/link/966cdc69-f6ce-4745-88bd-0e5553efa...@getmailspring.com/0?redirect=https%3A%2F%2Fstackoverflow.com%2Fa%2F42946528%2F4517001&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > > > > ). > > > > > > It may help if you could send sample data of the first pass that > went > > > > > > without issues and the second set of data that caused an issue. > > > > > > > > > > > > > > > Please > > > > > > check that you are allowed to share the data. > > > > > > Thanks > > > > > > Kabeer. > > > > > > > > > > > > On Dec 19 2019, at 7:33 pm, Y Ethan Guo < > ethan.guoyi...@gmail.com> > > > > > wrote: > > > > > > > Hi folks, > > > > > > > > > > > > > > I'm testing a new Deltastreamer job in cluster which > incrementally > > > > > pulls > > > > > > > data from an upstream Hudi table and upserts the dataset into > > > > > > > > > > > > > > > > > > > > > another > > > > > > > table. The first run of Deltastreamer job which involves only > > > > > > > > > > > > > > > > > > > > > inserts > > > > > > > succeeded. The second run of the job which involves updates > throws > > > > > > > > > > > > > > > > > > > the > > > > > > > following exception. I'm using a snapshot build of Hudi: > > > > > > > 0.4.8-SNAPSHOT[1]. I believe this is related to schema, but I > don't > > > > > > > > > > > > > > > > know > > > > > > > how I should debug and fix this. Any suggestion is appreciated. > > > > > > > > > > > > > > org.apache.spark.SparkException: Job aborted due to stage > failure: > > > > > > > Task 33 in stage 78.0 failed 4 times, most recent failure: Lost > > > > > > > > > > > > > > > > > > > > > task > > > > > > > 33.3 in stage 78.0 (TID 13283, executor 5): > > > > > > > com.uber.hoodie.exception.HoodieUpsertException: Error > upserting > > > > > > > bucketType UPDATE for partition :33 > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:271) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > > > > > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > > > > > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > > > > > > > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) > > > > > > > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760) > > > > > > > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > > > > > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > > > > > > > at org.apache.spark.scheduler.Task.run(Task.scala:108) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > > > > > > at java.lang.Thread.run(Thread.java:748) > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException: > > > > > > > com.uber.hoodie.exception.HoodieException: > > > > > > > java.util.concurrent.ExecutionException: > > > > > > > com.uber.hoodie.exception.HoodieException: operation has failed > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:181) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263) > > > > > > > ... 28 more > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException: > > > > > > > java.util.concurrent.ExecutionException: > > > > > > > com.uber.hoodie.exception.HoodieException: operation has failed > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:204) > > > > > > > ... 30 more > > > > > > > Caused by: java.util.concurrent.ExecutionException: > > > > > > > com.uber.hoodie.exception.HoodieException: operation has failed > > > > > > > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > > > > > > > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:144) > > > > > > > ... 31 more > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException: > operation has > > > > > > > > > > > > failed > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:232) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:211) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:49) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:262) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:124) > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > > > > ... 3 more > > > > > > > Caused by: java.lang.ClassCastException: required binary key > (UTF8) > > > > > > > > > > > > > > > > > > > is > > > > > > > not a group > > > > > > > at org.apache.parquet.schema.Type.asGroupType(Type.java:202) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$MapConverter.<init>(AvroRecordConverter.java:821) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:210) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.access$200(AvroRecordConverter.java:63) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:435) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:385) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:206) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.access$200(AvroRecordConverter.java:63) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$MapConverter$MapKeyValueConverter.<init>(AvroRecordConverter.java:872) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$MapConverter.<init>(AvroRecordConverter.java:822) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:210) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:112) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:79) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:132) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:175) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.func.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:45) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:44) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:94) > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > > > > at > > > > > > > > > > > > > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > > > > > > ... 4 more > > > > > > > > > > > > > > Driver stacktrace: > > > > > > > at org.apache.spark.scheduler.DAGScheduler.org > > > > > > > > > > > > > > > > > > > > > > > > > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) > > > > > > > at scala.Option.foreach(Option.scala:257) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676) > > > > > > > at > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) > > > > > > > at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2029) > > > > > > > at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) > > > > > > > at > org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1089) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > > > > > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > > > > > > > at org.apache.spark.rdd.RDD.fold(RDD.scala:1083) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply$mcD$sp(DoubleRDDFunctions.scala:35) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > > > > > > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:34) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.spark.api.java.JavaDoubleRDD.sum(JavaDoubleRDD.scala:165) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:279) > > > > > > > ... 47 elided > > > > > > > Caused by: com.uber.hoodie.exception.HoodieUpsertException: > Error > > > > > > > upserting bucketType UPDATE for partition :33 > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:271) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > > > > > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > > > > > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > > > > > > > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) > > > > > > > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760) > > > > > > > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > > > > > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > > > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > > > > > > > at org.apache.spark.scheduler.Task.run(Task.scala:108) > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > > > > > > > ... 3 more > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException: > > > > > > > com.uber.hoodie.exception.HoodieException: > > > > > > > java.util.concurrent.ExecutionException: > > > > > > > com.uber.hoodie.exception.HoodieException: operation has failed > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:206) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:181) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263) > > > > > > > ... 28 more > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException: > > > > > > > java.util.concurrent.ExecutionException: > > > > > > > com.uber.hoodie.exception.HoodieException: operation has failed > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:146) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:204) > > > > > > > ... 30 more > > > > > > > Caused by: java.util.concurrent.ExecutionException: > > > > > > > com.uber.hoodie.exception.HoodieException: operation has failed > > > > > > > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > > > > > > > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:144) > > > > > > > ... 31 more > > > > > > > Caused by: com.uber.hoodie.exception.HoodieException: > operation has > > > > > > > > > > > > failed > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:232) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:211) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:49) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:262) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:124) > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > > > > ... 3 more > > > > > > > Caused by: java.lang.ClassCastException: required binary key > (UTF8) > > > > > > > > > > > > > > > > > > > is > > > > > > > not a group > > > > > > > at org.apache.parquet.schema.Type.asGroupType(Type.java:202) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$MapConverter.<init>(AvroRecordConverter.java:821) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:210) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.access$200(AvroRecordConverter.java:63) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.<init>(AvroRecordConverter.java:435) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.<init>(AvroRecordConverter.java:385) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:206) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.access$200(AvroRecordConverter.java:63) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$MapConverter$MapKeyValueConverter.<init>(AvroRecordConverter.java:872) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter$MapConverter.<init>(AvroRecordConverter.java:822) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:210) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:112) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:79) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:132) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:175) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:149) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:125) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.func.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:45) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:44) > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > com.uber.hoodie.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:94) > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > > > > at > > > > > > > > > > > > > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > > > > > > ... 4 more > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > https://oss.sonatype.org/content/repositories/snapshots/com/uber/hoodie/hoodie-utilities-bundle/0.4.8-SNAPSHOT/hoodie-utilities-bundle-0.4.8-20190531.060546-1.jar > > > > > > > > > > > > > > Thanks, > > > > > > > - Ethan > > > > > > > > > > > > > > > > > > > > > > > >