[ https://issues.apache.org/jira/browse/HUDI-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063573#comment-17063573 ]
lamber-ken edited comment on HUDI-716 at 3/20/20, 6:51 PM: ----------------------------------------------------------- Hi [~vbalaji], I agree with your point. from code we can see that create new file first, then write content. [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java] !image-2020-03-21-02-45-25-099.png! was (Author: lamber-ken): Hi [~vbalaji], I agree with your point. from code we can see that create new file first, then write content. HoodieActiveTimeline#createFileInPath !image-2020-03-21-02-45-25-099.png! > Exception: Not an Avro data file when running HoodieCleanClient.runClean > ------------------------------------------------------------------------ > > Key: HUDI-716 > URL: https://issues.apache.org/jira/browse/HUDI-716 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: DeltaStreamer > Reporter: Alexander Filipchik > Assignee: lamber-ken > Priority: Major > Fix For: 0.6.0 > > Attachments: image-2020-03-21-02-45-25-099.png > > > Just upgraded to upstream master from 0.5 and seeing an issue at the end of > the delta sync run: > 20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running delta sync > once. Shutting down20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error > running delta sync once. Shutting > downorg.apache.hudi.exception.HoodieIOException: Not an Avro data file at > org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:144) > at > org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) > at org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:86) > at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:843) > at > org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:520) > at > org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:168) > at > org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:111) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:395) > at > org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:237) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: > java.io.IOException: Not an Avro data file at > org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) at > org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147) > at > org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:87) > at > org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:141) > ... 24 more > > It is attempting to read an old cleanup file (2 month old) and crashing > -- This message was sent by Atlassian Jira (v8.3.4#803005)