sivabalan narayanan created HUDI-5654:
-----------------------------------------
Summary: Metadata commits/read fails when data table has rollbacks
with empty completed file
Key: HUDI-5654
URL: https://issues.apache.org/jira/browse/HUDI-5654
Project: Apache Hudi
Issue Type: Bug
Components: writer-core
Reporter: sivabalan narayanan
instant.rollback (completed rollback file) in timeline is expected to be
non-empty.
So, in such cases, metadata commits /read fail since we could not parse
rollback commits.
{code:java}
org.apache.hudi.exception.HoodieException: Unable to do hoodie metadata table
validation in
file:///Users/ljain/codebase/onehouse/local_test/tbl_cow_4_2/hoodie_table
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.run(HoodieMetadataTableValidator.java:369)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.main(HoodieMetadataTableValidator.java:350)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException: Error fetching partition
paths from metadata table
at
org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:308)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.validatePartitions(HoodieMetadataTableValidator.java:529)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.doMetadataTableValidation(HoodieMetadataTableValidator.java:437)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.doHoodieMetadataTableValidationOnce(HoodieMetadataTableValidator.java:380)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.run(HoodieMetadataTableValidator.java:366)
... 13 more at
org.apache.hudi.utilities.HoodieMetadataTableValidator.doHoodieMetadataTableValidationOnce(HoodieMetadataTableValidator.java:380)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.run(HoodieMetadataTableValidator.java:366)
... 13 more
Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to
retrieve list of partition from metadata
at
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:123)
at
org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:306)
... 17 more
Caused by: org.apache.hudi.exception.HoodieMetadataException: Error retrieving
rollback commits for instant [20230119223110895__rollback__COMPLETED]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRollbackedCommits(HoodieBackedTableMetadata.java:576)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getValidInstantTimestamps$15(HoodieBackedTableMetadata.java:484)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getValidInstantTimestamps(HoodieBackedTableMetadata.java:483)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:503)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:438)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:423)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$2(HoodieBackedTableMetadata.java:227)
at java.util.HashMap.forEach(HashMap.java:1290)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:225)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:148)
at
org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:295)
at
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:121)
... 18 more
Caused by: org.apache.avro.InvalidAvroMagicException: Not an Avro data file
at
org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:57)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:207)
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:177)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRollbackedCommits(HoodieBackedTableMetadata.java:560)
... 38 more{code}
{code:java}
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)