konwu created HUDI-3105:
---------------------------

             Summary: flink bootstrap cause Invalid Hoodie Table error
                 Key: HUDI-3105
                 URL: https://issues.apache.org/jira/browse/HUDI-3105
             Project: Apache Hudi
          Issue Type: Bug
          Components: Flink Integration
            Reporter: konwu


environment
 * start flink task with enable bootstarp index
 * meet error before first success checkpoint
 * restart task also with bootstrap index enable

and then
org.apache.hudi.exception.InvalidTableException: Invalid Hoodie Table. 
viewfs://xx/xx/flight_order_info
  at 
org.apache.hudi.common.table.TableSchemaResolver.lambda$getTableParquetSchemaFromDataFile$0(TableSchemaResolver.java:88)
  at org.apache.hudi.common.util.Option.orElseThrow(Option.java:123)
  at 
org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:88)
  at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:153)
  at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:187)
  at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:163)
  at 
org.apache.hudi.sink.bootstrap.BootstrapFunction.loadRecords(BootstrapFunction.java:160)
  at 
org.apache.hudi.sink.bootstrap.BootstrapFunction.processElement(BootstrapFunction.java:110)
  at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
  at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:187)
  at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
  at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
  at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
  at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:395)
  at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
  at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:609)
  at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:573)
  at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
  at java.lang.Thread.run(Thread.java:748)
 
to resolve
[https://github.com/apache/hudi/blob/c81df99e50f2df84d85f08ff3a839595dad974d7/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L183]

 

maybe we need to move 

getTableAvroSchema into if condition



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to