KnightChess opened a new issue, #10504: URL: https://github.com/apache/hudi/issues/10504
**Describe the problem you faced** The task is stuck at the metadata update phase for a very long time. it happend in update mdt. And cannot consistently reproduce it.   the task thread we found is wait for a lock. when I rerun the job, it success faster. ```shell "Executor task launch worker for task 0.0 in stage 59.0 (TID 208)" #63 daemon prio=5 os_prio=0 tid=0x00000000011f0000 nid=0xc522d in Object.wait() [0x00007f0c2df17000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000006c5205158> (a java.lang.UNIXProcess) at java.lang.Object.wait(Object.java:502) at java.lang.UNIXProcess.waitFor(UNIXProcess.java:396) - locked <0x00000006c5205158> (a java.lang.UNIXProcess) at org.apache.hudi.org.openjdk.jol.vm.sa.ServiceabilityAgentSupport.callAgent(ServiceabilityAgentSupport.java:190) at org.apache.hudi.org.openjdk.jol.vm.sa.ServiceabilityAgentSupport.needSudo(ServiceabilityAgentSupport.java:109) at org.apache.hudi.org.openjdk.jol.vm.sa.ServiceabilityAgentSupport.<init>(ServiceabilityAgentSupport.java:88) at org.apache.hudi.org.openjdk.jol.vm.sa.ServiceabilityAgentSupport.instance(ServiceabilityAgentSupport.java:77) at org.apache.hudi.org.openjdk.jol.vm.VM.current(VM.java:77) at org.apache.hudi.org.openjdk.jol.info.GraphWalker.walk(GraphWalker.java:97) at org.apache.hudi.org.openjdk.jol.info.GraphLayout.parseInstance(GraphLayout.java:54) at org.apache.hudi.common.util.ObjectSizeCalculator.getObjectSize(ObjectSizeCalculator.java:57) at org.apache.hudi.common.util.DefaultSizeEstimator.sizeEstimate(DefaultSizeEstimator.java:32) at org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:189) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:434) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:343) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:265) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor$$Lambda$993/1409746200.call(Unknown Source) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike$$Lambda$994/672032971.apply(Unknown Source) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:911) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:911) at org.apache.spark.rdd.RDD$$Lambda$995/716628863.apply(Unknown Source) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369) at org.apache.spark.rdd.RDD.iterator(RDD.scala:333) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:382) at org.apache.spark.rdd.RDD$$Lambda$835/632399394.apply(Unknown Source) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager$$Lambda$618/303167871.apply(Unknown Source) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:380) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:369) at org.apache.spark.rdd.RDD.iterator(RDD.scala:333) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.executor.Executor$TaskRunner$$Lambda$475/438794158.apply(Unknown Source) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1463) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` **To Reproduce** **Expected behavior** The task can be completed quickly. **Environment Description** * Hudi version : 0.13.1 * Spark version : 3.2.0 * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : hdfs * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
