[
https://issues.apache.org/jira/browse/HUDI-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
KnightChess closed HUDI-5797.
-----------------------------
Resolution: Cannot Reproduce
> bulk insert as row will throw error without mdt init
> ----------------------------------------------------
>
> Key: HUDI-5797
> URL: https://issues.apache.org/jira/browse/HUDI-5797
> Project: Apache Hudi
> Issue Type: Bug
> Components: spark
> Reporter: KnightChess
> Assignee: KnightChess
> Priority: Major
> Labels: pull-request-available
>
> `bulkinsert as row` not initTable first, it will trigger mdt init when commit
> result after write in the same job, and this init will use fileSystem to
> init, which will contain orphan file or error file. For example, if writer
> not flush but kill by RM, the parquet file size may be 0, will triiger the
> following questions when init mdt.
>
> {code:java}
> Job aborted due to stage failure: Task 1 in stage 13.0 failed 4 times, most
> recent failure: Lost task 1.3 in stage 13.0 (TID 102100)
> (bigdata-nmg-hdp10339.nmg01.diditaxi.com executor 832):
> java.lang.IllegalStateException
> at
> org.apache.hudi.common.util.ValidationUtils.checkState(ValidationUtils.java:53)
> at
> org.apache.hudi.metadata.HoodieMetadataPayload.lambda$null$4(HoodieMetadataPayload.java:328)
> at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
> at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
> at
> java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1683)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at
> org.apache.hudi.metadata.HoodieMetadataPayload.lambda$createPartitionFilesRecord$5(HoodieMetadataPayload.java:323)
> at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
> at
> org.apache.hudi.metadata.HoodieMetadataPayload.createPartitionFilesRecord(HoodieMetadataPayload.java:321)
> at
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.lambda$getFilesPartitionRecords$f70c2081$1(HoodieBackedTableMetadataWriter.java:1105)
> at
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1892)
> at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1249)
> at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1249)
> at
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2261)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:131)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1463)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)