[ 
https://issues.apache.org/jira/browse/HUDI-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-721:
--------------------------------
    Fix Version/s: 0.6.0

> AvroConversionUtils is broken for complex types in 0.6
> ------------------------------------------------------
>
>                 Key: HUDI-721
>                 URL: https://issues.apache.org/jira/browse/HUDI-721
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Alexander Filipchik
>            Priority: Major
>             Fix For: 0.6.0
>
>
> hi,
> was working on the upgrade from 0.5 to 0.6 and hit a bug in 
> AvroConversionUtils. I originally blames it on Spark parquet to avro schema 
> generator (convertStructTypeToAvroSchema method), but after some debugging 
> I'm pretty sure the issue is somewhere in the: AvroConversionHelper.
> What happens: when complexes type is extracted using SqlTransformer (using 
> select bla fro <SRC>) where bla is complex type with arrays of struct, Kryo 
> serialization breaks with :
>  
> {code:java}
> 28701 [dag-scheduler-event-loop] INFO  
> org.apache.spark.scheduler.DAGScheduler  - ResultStage 1 (isEmpty at 
> DeltaSync.java:337) failed in 12.146 s due to Job aborted due to stage 
> failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 
> 0.0 in stage 1.0 (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
>       at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>       at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>       at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>       at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>       at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>       at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>       at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361)
>       at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302)
>       at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>       at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:351)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:456)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)Driver stacktrace:
> 28702 [main] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 1 failed: 
> isEmpty at DeltaSync.java:337, took 12.149897 s
> 28702 [main] ERROR 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  - Got error 
> running delta sync once. Shutting down
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 
> (TID 1, localhost, executor driver): 
> org.apache.avro.UnresolvedUnionException: Not in union 
> [{"type":"record","name":"order_item_detail","namespace":"hoodie.source.hoodie_source.order.customer_order.customer_items","fields":[{"name":"external_id","type":[{"type":"record","name":"external_id","namespace":"hoodie.source.hoodie_source.order.customer_order.customer_items.order_item_detail","fields":[{"name":"id","type":["string","null"]},{"name":"display_id","type":["string","null"]},{"name":"exist","type":["boolean","null"]}]},"null"]},{"name":"name","type":["string","null"]},{"name":"sale_price","type":[{"type":"record","name":"sale_price","namespace":"hoodie.source.hoodie_source.order.customer_order.customer_items.order_item_detail","fields":[{"name":"currency_code","type":["string","null"]},{"name":"units","type":["long","null"]},{"name":"nanos","type":["int","null"]},{"name":"exist","type":["boolean","null"]}]},"null"]},{"name":"quantity","type":["int","null"]},{"name":"note","type":["string","null"]},{"name":"customer_item_id","type":["string","null"]},{"name":"menu_customer_item_id","type":["string","null"]},{"name":"entity_path","type":[{"type":"record","name":"entity_path","namespace":"hoodie.source.hoodie_source.order.customer_order.customer_items.order_item_detail","fields":[{"name":"path_nodes","type":[{"type":"array","items":{"type":"record","name":"path_nodes","namespace":"hoodie.source.hoodie_source.order.customer_order.customer_items.order_item_detail.entity_path","fields":[{"name":"id","type":["string","null"]},{"name":"type","type":["string","null"]},{"name":"exist","type":["boolean","null"]}]}},"null"]},{"name":"exist","type":["boolean","null"]}]},"null"]},{"name":"exist","type":["boolean","null"]}]},"null"]:
>  {"external_id": null, "name": "Item 0", "sale_price": {"currency_code": 
> "KRW", "units": 900, "nanos": 0, "exist": null}, "quantity": 1, "note": "Item 
> 0 note", "customer_item_id": "37a49c46-42dd-4306-8ea5-e542bdfc0b0c", 
> "menu_customer_item_id": "", "entity_path": null, "exist": null}
>       at 
> org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
>       at 
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:192)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:120)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
>       at 
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
>       at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
>       at 
> org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(GenericAvroSerializer.scala:125)
>       at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:159)
>       at 
> org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSerializer.scala:47)
>       at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>       at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:361)
>       at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302)
>       at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
>       at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:351)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:456)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> For a problematic pyload:
> Integer i = union.getIndexNamed(getSchemaName(datum))
> breakes to:
> union.getIndexNamed(getSchemaName(datum)) returns null.
> getSchemaName(datum) returns: 
> hoodie.source.hoodie_source.order.customer_items.customer_items.order_item_detail
> but union's schema:
> {code:java}
> {"type":"record","name":"order_item_detail",
> "namespace":"hoodie.source.hoodie_source.order.customer_order.customer_items"
> {code}
> customer_items.customer_items is repeated in the result of getSchemaName.
> union.getIndexNamed("hoodie.source.hoodie_source.order.customer_order.customer_items.order_item_detail")
> returns proper index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to