t0il3ts0ap opened a new issue #2589:
URL: https://github.com/apache/hudi/issues/2589
**Describe the problem you faced**
Schema evolution is not working when using deltastreamer with a kafka source
and avro schema registry.
In my usecase, I am trying to ingest cdc dumped to kafka by a debezium
connector. The registry is maintained by debezium connector.
On making a simple schema change like `alter table accounts add column
description text default 'test description';`, The change is reflected in
schema registry by debezium. The new column is visible in hive metastore as
well.
I am using a transformer to make some modifications in fields ( which are
running fine normally ) and overriding target schema to null so as to use
DataSet's schema while writing.
Results in warning
```
21/02/22 08:27:51 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID 471,
ip-172-31-4-18.ap-south-1.compute.internal, executor 1):
org.apache.hudi.exception.HoodieException: Exception when reading log file
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:250)
at
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
at
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:93)
at
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:75)
at
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
at
org.apache.hudi.table.action.compact.HoodieSparkMergeOnReadTableCompactor.compact(HoodieSparkMergeOnReadTableCompactor.java:142)
at
org.apache.hudi.table.action.compact.HoodieSparkMergeOnReadTableCompactor.lambda$compact$9ec9d4c7$1(HoodieSparkMergeOnReadTableCompactor.java:105)
at
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1041)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.avro.AvroTypeException: Found
hoodie.source.hoodie_source, expecting hoodie.source.hoodie_source, missing
required field description
at
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at
org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:157)
at
org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
at
org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:278)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:313)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:156)
... 29 more
```
And eventually an error
```
21/02/22 08:28:13 WARN ExceptionMapper: Uncaught exception
java.lang.IllegalArgumentException: Last known instant from client was
20210222075752 but server has the following timeline
[[20210222070142__clean__COMPLETED], [20210222070358__clean__COMPLETED],
[20210222070407__deltacommit__COMPLETED], [20210222070426__commit__COMPLETED],
[20210222070432__deltacommit__COMPLETED],
[20210222070457__deltacommit__COMPLETED],
[20210222070518__deltacommit__COMPLETED],
[20210222070542__deltacommit__COMPLETED], [20210222070559__clean__COMPLETED],
[20210222070605__deltacommit__COMPLETED], [20210222070635__commit__COMPLETED],
[20210222070644__deltacommit__COMPLETED],
[20210222070707__deltacommit__COMPLETED],
[20210222070737__deltacommit__COMPLETED],
[20210222070800__deltacommit__COMPLETED], [20210222070818__clean__COMPLETED],
[20210222070825__deltacommit__COMPLETED], [20210222070843__commit__COMPLETED],
[20210222070848__deltacommit__COMPLETED],
[20210222070912__deltacommit__COMPLETED],
[20210222070945__deltacommit__COMPLETED], [20210222071010__deltacommi
t__COMPLETED], [20210222071028__clean__COMPLETED],
[20210222071036__deltacommit__COMPLETED],
[==>20210222071054__compaction__INFLIGHT],
[20210222071059__deltacommit__COMPLETED],
[20210222071532__rollback__COMPLETED],
[20210222071533__deltacommit__COMPLETED],
[20210222072221__rollback__COMPLETED],
[20210222072222__deltacommit__COMPLETED],
[20210222072803__rollback__COMPLETED],
[20210222072804__deltacommit__COMPLETED],
[20210222072935__rollback__COMPLETED],
[20210222072936__deltacommit__COMPLETED], [20210222073001__clean__COMPLETED],
[20210222075751__rollback__COMPLETED],
[20210222075752__deltacommit__COMPLETED], [20210222082746__rollback__COMPLETED]]
at
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
at
org.apache.hudi.timeline.service.FileSystemViewHandler$ViewHandler.handle(FileSystemViewHandler.java:372)
at
io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
at
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
at
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
at
io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
at
io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
at
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at
org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at
org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
at
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at
org.apache.hudi.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
at
org.apache.hudi.org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
at
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.apache.hudi.org.eclipse.jetty.server.Server.handle(Server.java:502)
at
org.apache.hudi.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at
org.apache.hudi.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at
org.apache.hudi.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at
org.apache.hudi.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at
org.apache.hudi.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at
org.apache.hudi.org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at
org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
at
org.apache.hudi.org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.lang.Thread.run(Thread.java:748)
```
**Environment Description**
* Hudi version : 0.7.0
* Spark version : 3.0.1
* Hive version : 3.1.2
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]