[ 
https://issues.apache.org/jira/browse/HUDI-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1716:
--------------------------------------
    Description: 
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

gist of the stack trace:

Caused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedField at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
 at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
 at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) at 
org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
 at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
 at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
 ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 83.0 
(TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
org.apache.hudi.exception.HoodieException: Exception when reading log file  at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:93)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:75)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
 at 
org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
at 
org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.<init>(HoodieMergeOnReadRDD.scala:210)
 at 
org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
 at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

Steps to reproduce in spark shell:
 # create MOR table w/ schema1. 
 # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
It took me 2 batch of updates to see a log file.
 # create a new schema2 with one new additional field. ingest a batch with 
schema2 that updates existing records. 
 # read entire dataset. 

 

 

 

  was:
Looks like realtime view w/ MOR table fails if schema present in existing log 
file is evolved to add a new field. no issues w/ writing. but reading fails

More info: [https://github.com/apache/hudi/issues/2675]

 

gist of the stack trace:

Caused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
evolvedField at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
 at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
 at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) at 
org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
 at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
 at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
 at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
 ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 83.0 
(TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
org.apache.hudi.exception.HoodieException: Exception when reading log file  at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:93)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:75)
 at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
 at 
org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
at 
org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.<init>(HoodieMergeOnReadRDD.scala:210)
 at 
org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
 at org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)

 

Logs from local run: 

[https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]

diff with which above logs were generated: 
[https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]

 

Steps to reproduce in spark shell:
 # create MOR table w/ schema1. 
 # Ingest (with schema1) until log files are created. // verify via hudi-cli. I 
didn't see log files w/ just 1 batch of updates. If not, do multiple rounds 
until you see log files.
 # create a new schema2 with one new additional field. ingest a batch with 
schema2 that updates existing records. 
 # read entire dataset. 

 

 

 


> rt view w/ MOR tables fails after schema evolution
> --------------------------------------------------
>
>                 Key: HUDI-1716
>                 URL: https://issues.apache.org/jira/browse/HUDI-1716
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Storage Management
>            Reporter: sivabalan narayanan
>            Priority: Major
>              Labels: sev:critical, user-support-issues
>             Fix For: 0.9.0
>
>
> Looks like realtime view w/ MOR table fails if schema present in existing log 
> file is evolved to add a new field. no issues w/ writing. but reading fails
> More info: [https://github.com/apache/hudi/issues/2675]
>  
> gist of the stack trace:
> Caused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedFieldCaused by: org.apache.avro.AvroTypeException: Found 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, expecting 
> hoodie.hudi_trips_cow.hudi_trips_cow_record, missing required field 
> evolvedField at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at 
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) 
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:215)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) 
> at 
> org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.deserializeRecords(HoodieAvroDataBlock.java:165)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.createRecordsFromContentBytes(HoodieDataBlock.java:128)
>  at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecords(HoodieDataBlock.java:106)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:289)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:324)
>  at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:252)
>  ... 24 more21/03/25 11:27:03 WARN TaskSetManager: Lost task 0.0 in stage 
> 83.0 (TID 667, sivabala-c02xg219jgh6.attlocal.net, executor driver): 
> org.apache.hudi.exception.HoodieException: Exception when reading log file  
> at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:261)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:100)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:93)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:75)
>  at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:230)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:328) 
> at 
> org.apache.hudi.HoodieMergeOnReadRDD$$anon$3.<init>(HoodieMergeOnReadRDD.scala:210)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.payloadCombineFileIterator(HoodieMergeOnReadRDD.scala:200)
>  at 
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:77)
>  
> Logs from local run: 
> [https://gist.github.com/nsivabalan/656956ab313676617d84002ef8942198]
> diff with which above logs were generated: 
> [https://gist.github.com/nsivabalan/84dad29bc1ab567ebb6ee8c63b3969ec]
>  
> Steps to reproduce in spark shell:
>  # create MOR table w/ schema1. 
>  # Ingest (with schema1) until log files are created. // verify via hudi-cli. 
> It took me 2 batch of updates to see a log file.
>  # create a new schema2 with one new additional field. ingest a batch with 
> schema2 that updates existing records. 
>  # read entire dataset. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to