wwbbcjeyc opened a new issue, #5879:
URL: https://github.com/apache/hudi/issues/5879
Setup/Env config:
Flink 1.13.5
hudi: 0.10.1
Hudi Flink Config:
'connector'='hudi',
'path' ='-------',
'table.type'='MERGE_ON_READ',
'hoodie.datasource.write.recordkey.field' = 'track_id',
'hoodie.parquet.max.file.size' = '268435456',
'hoodie.parquet.small.file.limit' = '134217728',
'write.task.max.size' = '2048',
'write.merge.max_memory' = '1024',
'write.precombine.field' = 'time',
'write.bucket_assign.tasks'='20',
'write.tasks' = '50',
'compaction.tasks' = '5',
'compaction.async.enabled' = 'true',
'compaction.schedule.enabled' = 'true',
'compaction.trigger.strategy' = 'num_commits',
'compaction.delta_commits' = '5',
'changelog.enabled' = 'true',
'index.state.ttl' = '0.5',
'hive_sync.enable'='true',
'hive_sync.mode' = 'hms',
'hive_sync.metastore.uris' = '----',
'hive_sync.table'='--',
'hive_sync.db'='-------'
It started to run normally, but after a period of execution, the following
error will be reported
Exception trace during upsert:
2022-06-09 12:48:52
org.apache.hudi.exception.HoodieUpsertException: Error upsetting bucketType
UPDATE for partition :20220606
at
org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.handleUpsertPartition(BaseFlinkCommitActionExecutor.java:204)
at
org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:108)
at
org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:70)
at
org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:73)
at
org.apache.hudi.table.action.commit.delta.FlinkUpsertDeltaCommitActionExecutor.execute(FlinkUpsertDeltaCommitActionExecutor.java:49)
at
org.apache.hudi.table.HoodieFlinkMergeOnReadTable.upsert(HoodieFlinkMergeOnReadTable.java:72)
at
org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:148)
at
org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$66(StreamWriteFunction.java:184)
at
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:426)
at
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:386)
at
org.apache.hudi.sink.bucket.BucketStreamWriteFunction.processElement(BucketStreamWriteFunction.java:129)
at
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205)
at
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at
org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to
retrieve files in partition hdfs://xxxx/xxxx/20220606 from metadata
at
org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:137)
at
org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemView.java:65)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$148(AbstractTableFileSystemView.java:304)
at
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:295)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlice(AbstractTableFileSystemView.java:609)
at
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:103)
at
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlice(PriorityBasedFileSystemView.java:264)
at
org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:143)
at
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:378)
at
org.apache.hudi.table.action.commit.delta.BaseFlinkDeltaCommitActionExecutor.handleUpdate(BaseFlinkDeltaCommitActionExecutor.java:52)
at
org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.handleUpsertPartition(BaseFlinkCommitActionExecutor.java:196)
... 24 more
Caused by: org.apache.hudi.exception.HoodieMetadataException: Error
retrieving rollback commits for instant [20220609122503195__rollback__COMPLETED]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRollbackedCommits(HoodieBackedTableMetadata.java:547)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getValidInstantTimestamps$382(HoodieBackedTableMetadata.java:451)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getValidInstantTimestamps(HoodieBackedTableMetadata.java:450)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:475)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:461)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:407)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getOrCreateReaders$379(HoodieBackedTableMetadata.java:393)
at
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:393)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$369(HoodieBackedTableMetadata.java:202)
at java.util.HashMap.forEach(HashMap.java:1289)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:200)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:140)
at
org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:312)
at
org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:135)
... 35 more
Caused by: org.apache.hudi.org.apache.avro.InvalidAvroMagicException: Not an
Avro data file
at
org.apache.hudi.org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:56)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:204)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:174)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRollbackedCommits(HoodieBackedTableMetadata.java:531)
... 58 more
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]