[
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan resolved HUDI-1716.
---------------------------------------
Resolution: Fixed
> rt view w/ MOR tables fails after schema evolution
> --------------------------------------------------
>
> Key: HUDI-1716
> URL: https://issues.apache.org/jira/browse/HUDI-1716
> Project: Apache Hudi
> Issue Type: Bug
> Components: Storage Management
> Reporter: sivabalan narayanan
> Assignee: Aditya Tiwari
> Priority: Major
> Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>
> gist of the stack trace:
> Caused by: org.apache.avro.AvroTypeException: Found
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field
> evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field
> evolvedField at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
> at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
> at
> org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
> at
> org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
> at
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
> at
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
> at
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
> at
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
> ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage
> 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver):
> org.apache.hudi.exception.HoodieException: Exception when reading log file
> at
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
> at
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
> at
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:93)
> at
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:75)
> at
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
> at
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328)
> at
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.<init>(HoodieMergeOnReadRDD.scala:210)
> at
> org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
> at
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)
>
> Logs from local run:
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated:
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>
> Steps to reproduce in spark shell:
> # create MOR table w/ schema1.
> # Ingest (with schema1) until log files are created. // verify via hudi-cli.
> It took me 2 batch of updates to see a log file.
> # create a new schema2 with one new additional field. ingest a batch with
> schema2 that updates existing records.
> # read entire dataset.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)