Manoj Govindassamy created HUDI-3031:
----------------------------------------
Summary: TestHoodieDeltaStreamerWithMultiWriter time out due to
async services and writer deadlock
Key: HUDI-3031
URL: https://issues.apache.org/jira/browse/HUDI-3031
Project: Apache Hudi
Issue Type: Task
Components: Writer Core
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
Fix For: 0.11.0
Off late, TestHoodieDeltaStreamerWithMultiWriter started consistently failing
for the MOR table type. The test spins off few pool threads to do table
ingestion via back filling along with async compaction and clustering. After
the data ingestion is completed the test endlessly waits for the the following
condition to pass.
{code:java}
// Condition for parallel ingestion job
Function<Boolean, Boolean> conditionForRegularIngestion = (r) -> {
if (tableType.equals(HoodieTableType.MERGE_ON_READ)) {
TestHoodieDeltaStreamer.TestHelpers.assertAtleastNDeltaCommitsAfterCommit(3,
lastSuccessfulCommit, tableBasePath, fs());
} else {
TestHoodieDeltaStreamer.TestHelpers.assertAtleastNCompactionCommitsAfterCommit(3,
lastSuccessfulCommit, tableBasePath, fs());
}
TestHoodieDeltaStreamer.TestHelpers.assertRecordCount(totalRecords,
tableBasePath + "/*/*.parquet", sqlContext());
TestHoodieDeltaStreamer.TestHelpers.assertDistanceCount(totalRecords,
tableBasePath + "/*/*.parquet", sqlContext());
return true;
}; {code}
Issue 1: The compaction thread and the writer thread are in deadlock
{code:java}
"async_compact_thread" #188 prio=5 os_prio=31 tid=0x00007f8c26266800
nid=0x15803 waiting for monitor entry [0x0000700009d3e000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:70)
- waiting to lock <0x00000006c353f528> (a
org.apache.hudi.client.transaction.TransactionManager)
at
org.apache.hudi.client.SparkRDDWriteClient.completeCompaction(SparkRDDWriteClient.java:312)
at
org.apache.hudi.client.SparkRDDWriteClient.commitCompaction(SparkRDDWriteClient.java:294)
at
org.apache.hudi.client.HoodieSparkCompactor.compact(HoodieSparkCompactor.java:59)
at
org.apache.hudi.async.AsyncCompactService.lambda$null$1(AsyncCompactService.java:89)
at
org.apache.hudi.async.AsyncCompactService$$Lambda$612/2034420774.get(Unknown
Source)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-22-thread-1" #143 prio=5 os_prio=31 tid=0x00007f8c0b125800 nid=0x12603
waiting on condition [0x0000700006fb7000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hudi.client.transaction.FileSystemBasedLockProviderTestClass.tryLock(FileSystemBasedLockProviderTestClass.java:80)
at
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:68)
at
org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:64)
- locked <0x00000006c353f528> (a
org.apache.hudi.client.transaction.TransactionManager)
at
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:193)
at
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:536)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:308)
{code}
Issue 2: Even after fixing the above my replacing the
hoodie.write.lock.provider with the local lock provider, the end condition of 3
DeltaCommitAfterLastCommit is not met and the test times out. This needs to be
digged further.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)