konwu created HUDI-3105:
---------------------------
Summary: flink bootstrap cause Invalid Hoodie Table error
Key: HUDI-3105
URL: https://issues.apache.org/jira/browse/HUDI-3105
Project: Apache Hudi
Issue Type: Bug
Components: Flink Integration
Reporter: konwu
environment
* start flink task with enable bootstarp index
* meet error before first success checkpoint
* restart task also with bootstrap index enable
and then
org.apache.hudi.exception.InvalidTableException: Invalid Hoodie Table.
viewfs://xx/xx/flight_order_info
at
org.apache.hudi.common.table.TableSchemaResolver.lambda$getTableParquetSchemaFromDataFile$0(TableSchemaResolver.java:88)
at org.apache.hudi.common.util.Option.orElseThrow(Option.java:123)
at
org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:88)
at
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:153)
at
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:187)
at
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:163)
at
org.apache.hudi.sink.bootstrap.BootstrapFunction.loadRecords(BootstrapFunction.java:160)
at
org.apache.hudi.sink.bootstrap.BootstrapFunction.processElement(BootstrapFunction.java:110)
at
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:187)
at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:395)
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:609)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
at java.lang.Thread.run(Thread.java:748)
to resolve
[https://github.com/apache/hudi/blob/c81df99e50f2df84d85f08ff3a839595dad974d7/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L183]
maybe we need to move
getTableAvroSchema into if condition
--
This message was sent by Atlassian Jira
(v8.20.1#820001)