sbernauer commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-847940734
Hi @nsivabalan, we have multiple schema versions of the events we consume. We use kafka and Confluent Schema Registry. I think all the events in kafka are written with schema version 9. My testcase would be to read some Events with schema version 8, switch to schema version 9 and consume some evolved Events. We use a COW Table and INSERTs only (with dropping of duplicates). With the patch in https://github.com/apache/hudi/pull/2927 starting from an empty directory the ingestion throws this exception in the executors. Reading with schema version 9 works fine. ``` schemaRegistryUrl: https://eventbus-schema-bs-qa.server.lan/subjects/MyEvent-v1/versions/8 # Sets --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider # and curl --silent $SCHEMA_REGISTRY_URL | jq -r -c '.schema' | jq '.' > /tmp/schema_source.json cp /tmp/schema_source.json /tmp/schema_target.json 21/05/25 14:45:55 ERROR HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=sourceEventHeader.happenedTimestamp:1621953763077,sourceEventHeader.eventId:143d1259-01c2-4346-a3c4-85b2e3325ff3 partitionPath=2021/05/25}, currentLocation='null', newLocation='null'} java.lang.ArrayIndexOutOfBoundsException: 22 at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) at org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:136) at org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:126) at org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.getInsertValue(OverwriteWithLatestAvroPayload.java:69) at org.apache.hudi.execution.HoodieLazyInsertIterable$HoodieInsertValueGenResult.<init>(HoodieLazyInsertIterable.java:88) at org.apache.hudi.execution.HoodieLazyInsertIterable.lambda$getTransformFunction$0(HoodieLazyInsertIterable.java:101) at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.insertRecord(BoundedInMemoryQueue.java:190) at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:46) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` The schema difference. The field is nested multiple times. ``` $ curl https://eventbus-schema-bs-qa.server.lan/subjects/MyEvent-v1/versions/8 | jq -r '.schema' | jq > 8 $ curl https://eventbus-schema-bs-qa.server.lan/subjects/MyEvent-v1/versions/9 | jq -r '.schema' | jq > 9 $ diff -U 5 8 9 --- 8 2021-05-25 16:51:21.416603077 +0200 +++ 9 2021-05-25 16:51:25.072629744 +0200 @@ -326,10 +326,22 @@ "type": "string", "avro.java.string": "String" } }, "doc": "* List of optional claim names" + }, + { + "name": "voluntary", + "type": { + "type": "array", + "items": { + "type": "string", + "avro.java.string": "String" + } + }, + "doc": "* List of voluntary claim names", + "default": [] } ], "version": "1.0.0" }, "doc": "* Info about the requested claims" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
