sbernauer edited a comment on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-847940734


   Hi @nsivabalan,
   
   we have multiple schema versions of the events we consume. We use kafka and 
Confluent Schema Registry. I think all the events in kafka are written with 
schema version 9.
   My testcase would be to read some Events with schema version 8, switch to 
schema version 9 and consume some evolved Events. We use a COW Table and 
INSERTs only (with dropping of duplicates) and no transformation (for most of 
our applications).
   
   With the patch in https://github.com/apache/hudi/pull/2927 starting from an 
empty directory the ingestion throws this exception in the executors. Reading 
with schema version 9 works fine.
   
   ```
   schemaRegistryUrl: 
https://eventbus-schema-bs-qa.server.lan/subjects/MyEvent-v1/versions/8
   # Sets
   --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   # and
   curl --silent $SCHEMA_REGISTRY_URL | jq -r -c '.schema' | jq '.' > 
/tmp/schema_source.json
   cp /tmp/schema_source.json /tmp/schema_target.json
   
   21/05/25 14:45:55 ERROR HoodieWriteHandle: Error writing record 
HoodieRecord{key=HoodieKey { 
recordKey=sourceEventHeader.happenedTimestamp:1621953763077,sourceEventHeader.eventId:143d1259-01c2-4346-a3c4-85b2e3325ff3
 partitionPath=2021/05/25}, currentLocation='null', newLocation='null'}
    java.lang.ArrayIndexOutOfBoundsException: 22
           at 
org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424)
           at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
           at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
           at 
org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
           at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
           at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
           at 
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
           at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
           at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
           at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
           at 
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
           at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
           at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
           at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
           at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
           at 
org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:136)
           at 
org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:126)
           at 
org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.getInsertValue(OverwriteWithLatestAvroPayload.java:69)
           at 
org.apache.hudi.execution.HoodieLazyInsertIterable$HoodieInsertValueGenResult.<init>(HoodieLazyInsertIterable.java:88)
           at 
org.apache.hudi.execution.HoodieLazyInsertIterable.lambda$getTransformFunction$0(HoodieLazyInsertIterable.java:101)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.insertRecord(BoundedInMemoryQueue.java:190)
           at 
org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:46)
           at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   ```
   
   The schema difference. The field is nested multiple times.
   ```
   $ curl 
https://eventbus-schema-bs-qa.server.lan/subjects/MyEvent-v1/versions/8 | jq -r 
'.schema' | jq > 8
   $ curl 
https://eventbus-schema-bs-qa.server.lan/subjects/MyEvent-v1/versions/9 | jq -r 
'.schema' | jq > 9
   
   $ diff -U 5 8 9
   --- 8   2021-05-25 16:51:21.416603077 +0200
   +++ 9   2021-05-25 16:51:25.072629744 +0200
   @@ -326,10 +326,22 @@
                    "type": "string",
                    "avro.java.string": "String"
                  }
                },
                "doc": "* List of optional claim names"
   +          },
   +          {
   +            "name": "voluntary",
   +            "type": {
   +              "type": "array",
   +              "items": {
   +                "type": "string",
   +                "avro.java.string": "String"
   +              }
   +            },
   +            "doc": "* List of voluntary claim names",
   +            "default": []
              }
            ],
            "version": "1.0.0"
          },
          "doc": "* Info about the requested claims"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to