Davis Zhang created HUDI-8887:
---------------------------------
Summary: FileSystem lock provider can have more than 1 threads
hold the lock at the same time
Key: HUDI-8887
URL: https://issues.apache.org/jira/browse/HUDI-8887
Project: Apache Hudi
Issue Type: Bug
Reporter: Davis Zhang
related https://issues.apache.org/jira/browse/HUDI-7483
It is clear that when the test error out with file system LP, the instant file
has conflict.
It is verified by the following unit test
{code:java}
// Copy paste to
hudi-oss/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java
@Test
public void test22() throws Exception {
// step1: create a pending replace/commit/compact instant: C1,C11,C12
HoodieInstant compact = new HoodieInstant(HoodieInstant.State.REQUESTED,
HoodieTimeline.COMPACTION_ACTION, "20250117104717388",
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
HoodieInstant ds = new HoodieInstant(HoodieInstant.State.INFLIGHT,
HoodieTimeline.DELTA_COMMIT_ACTION, "20250117104722625",
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
ConcurrentOperation cc = new ConcurrentOperation(compact, metaClient);
ConcurrentOperation cds = new ConcurrentOperation(ds, metaClient);
SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new
SimpleConcurrentFileWritesConflictResolutionStrategy();
Assertions.assertTrue(strategy.hasConflict(cc, cds));
} {code}
First stop the test at the first line, then copy the .hoodie folder of
7483.zip/repro/.hoodie to the base path of the meta client used by this test.
Run the test and it pass.
It means given the compaction plan and the delta commit instant, the conflict
resolution strategy does its job. The only explanation is the lock does not
hold the exclusive lock owner invariant, which leads to concurrent execution of
OCC validation phase.
Given the in process lock provider test dimension has been very stable for a
long time, we highly suspect the file system lock does not do its work
depending on the OS/docker container we are using. Disable the test dimension
to avoid false alarm in java CI
--
This message was sent by Atlassian Jira
(v8.20.10#820010)