[
https://issues.apache.org/jira/browse/HUDI-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461731#comment-17461731
]
Manoj Govindassamy commented on HUDI-3031:
------------------------------------------
Related - https://issues.apache.org/jira/browse/HUDI-3043
> TestHoodieDeltaStreamerWithMultiWriter time out due to async services and
> writer deadlock
> -----------------------------------------------------------------------------------------
>
> Key: HUDI-3031
> URL: https://issues.apache.org/jira/browse/HUDI-3031
> Project: Apache Hudi
> Issue Type: Task
> Components: Writer Core
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
> Priority: Blocker
> Fix For: 0.11.0
>
>
> Off late, TestHoodieDeltaStreamerWithMultiWriter started consistently failing
> for the MOR table type. The test spins off few pool threads to do table
> ingestion via back filling along with async compaction and clustering. After
> the data ingestion is completed the test endlessly waits for the the
> following condition to pass.
>
> {code:java}
> // Condition for parallel ingestion job
> Function<Boolean, Boolean> conditionForRegularIngestion = (r) -> {
> if (tableType.equals(HoodieTableType.MERGE_ON_READ)) {
>
> TestHoodieDeltaStreamer.TestHelpers.assertAtleastNDeltaCommitsAfterCommit(3,
> lastSuccessfulCommit, tableBasePath, fs());
> } else {
>
> TestHoodieDeltaStreamer.TestHelpers.assertAtleastNCompactionCommitsAfterCommit(3,
> lastSuccessfulCommit, tableBasePath, fs());
> }
> TestHoodieDeltaStreamer.TestHelpers.assertRecordCount(totalRecords,
> tableBasePath + "/*/*.parquet", sqlContext());
> TestHoodieDeltaStreamer.TestHelpers.assertDistanceCount(totalRecords,
> tableBasePath + "/*/*.parquet", sqlContext());
> return true;
> }; {code}
> Issue 1: The compaction thread and the writer thread are in deadlock
> {code:java}
> "async_compact_thread" #188 prio=5 os_prio=31 tid=0x00007f8c26266800
> nid=0x15803 waiting for monitor entry [0x0000700009d3e000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:70)
> - waiting to lock <0x00000006c353f528> (a
> org.apache.hudi.client.transaction.TransactionManager)
> at
> org.apache.hudi.client.SparkRDDWriteClient.completeCompaction(SparkRDDWriteClient.java:312)
> at
> org.apache.hudi.client.SparkRDDWriteClient.commitCompaction(SparkRDDWriteClient.java:294)
> at
> org.apache.hudi.client.HoodieSparkCompactor.compact(HoodieSparkCompactor.java:59)
> at
> org.apache.hudi.async.AsyncCompactService.lambda$null$1(AsyncCompactService.java:89)
> at
> org.apache.hudi.async.AsyncCompactService$$Lambda$612/2034420774.get(Unknown
> Source)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "pool-22-thread-1" #143 prio=5 os_prio=31 tid=0x00007f8c0b125800 nid=0x12603
> waiting on condition [0x0000700006fb7000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hudi.client.transaction.FileSystemBasedLockProviderTestClass.tryLock(FileSystemBasedLockProviderTestClass.java:80)
> at
> org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:68)
> at
> org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:64)
> - locked <0x00000006c353f528> (a
> org.apache.hudi.client.transaction.TransactionManager)
> at
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:193)
> at
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:536)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:308)
> {code}
> Issue 2: Even after fixing the above my replacing the
> hoodie.write.lock.provider with the local lock provider, the end condition of
> 3 DeltaCommitAfterLastCommit is not met and the test times out. This needs to
> be digged further.
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)