[
https://issues.apache.org/jira/browse/HUDI-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davis Zhang updated HUDI-8887:
------------------------------
Description:
related https://issues.apache.org/jira/browse/HUDI-7483
It is clear that when the test error out with file system LP, the instant file
has conflict.
It is verified by the following unit test
{code:java}
// Copy paste to
hudi-oss/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java
@Test
public void test22() throws Exception {
// step1: create a pending replace/commit/compact instant: C1,C11,C12
HoodieInstant compact = new HoodieInstant(HoodieInstant.State.REQUESTED,
HoodieTimeline.COMPACTION_ACTION, "20250117104717388",
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
HoodieInstant ds = new HoodieInstant(HoodieInstant.State.INFLIGHT,
HoodieTimeline.DELTA_COMMIT_ACTION, "20250117104722625",
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
ConcurrentOperation cc = new ConcurrentOperation(compact, metaClient);
ConcurrentOperation cds = new ConcurrentOperation(ds, metaClient);
SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new
SimpleConcurrentFileWritesConflictResolutionStrategy();
Assertions.assertTrue(strategy.hasConflict(cc, cds));
} {code}
First stop the test at the first line, then copy the .hoodie folder of
7483.zip/repro/.hoodie available in
https://issues.apache.org/jira/browse/HUDI-7483 to the base path of the meta
client used by this test. Run the test and it pass.
It means given the compaction plan and the delta commit instant, the conflict
resolution strategy does its job. The only explanation is the lock does not
hold the exclusive lock owner invariant, which leads to concurrent execution of
OCC validation phase.
Given the in process lock provider test dimension has been very stable for a
long time, we highly suspect the file system lock does not do its work
depending on the OS/docker container we are using. Disable the test dimension
to avoid false alarm in java CI
was:
related https://issues.apache.org/jira/browse/HUDI-7483
It is clear that when the test error out with file system LP, the instant file
has conflict.
It is verified by the following unit test
{code:java}
// Copy paste to
hudi-oss/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java
@Test
public void test22() throws Exception {
// step1: create a pending replace/commit/compact instant: C1,C11,C12
HoodieInstant compact = new HoodieInstant(HoodieInstant.State.REQUESTED,
HoodieTimeline.COMPACTION_ACTION, "20250117104717388",
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
HoodieInstant ds = new HoodieInstant(HoodieInstant.State.INFLIGHT,
HoodieTimeline.DELTA_COMMIT_ACTION, "20250117104722625",
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
ConcurrentOperation cc = new ConcurrentOperation(compact, metaClient);
ConcurrentOperation cds = new ConcurrentOperation(ds, metaClient);
SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new
SimpleConcurrentFileWritesConflictResolutionStrategy();
Assertions.assertTrue(strategy.hasConflict(cc, cds));
} {code}
First stop the test at the first line, then copy the .hoodie folder of
7483.zip/repro/.hoodie to the base path of the meta client used by this test.
Run the test and it pass.
It means given the compaction plan and the delta commit instant, the conflict
resolution strategy does its job. The only explanation is the lock does not
hold the exclusive lock owner invariant, which leads to concurrent execution of
OCC validation phase.
Given the in process lock provider test dimension has been very stable for a
long time, we highly suspect the file system lock does not do its work
depending on the OS/docker container we are using. Disable the test dimension
to avoid false alarm in java CI
> FileSystem lock provider can have more than 1 threads hold the lock at the
> same time
> ------------------------------------------------------------------------------------
>
> Key: HUDI-8887
> URL: https://issues.apache.org/jira/browse/HUDI-8887
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Davis Zhang
> Priority: Major
>
> related https://issues.apache.org/jira/browse/HUDI-7483
> It is clear that when the test error out with file system LP, the instant
> file has conflict.
> It is verified by the following unit test
>
> {code:java}
> // Copy paste to
> hudi-oss/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java
> @Test
> public void test22() throws Exception {
> // step1: create a pending replace/commit/compact instant: C1,C11,C12
> HoodieInstant compact = new HoodieInstant(HoodieInstant.State.REQUESTED,
> HoodieTimeline.COMPACTION_ACTION, "20250117104717388",
> InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
> HoodieInstant ds = new HoodieInstant(HoodieInstant.State.INFLIGHT,
> HoodieTimeline.DELTA_COMMIT_ACTION, "20250117104722625",
> InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
> ConcurrentOperation cc = new ConcurrentOperation(compact, metaClient);
> ConcurrentOperation cds = new ConcurrentOperation(ds, metaClient);
> SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new
> SimpleConcurrentFileWritesConflictResolutionStrategy();
> Assertions.assertTrue(strategy.hasConflict(cc, cds));
> } {code}
> First stop the test at the first line, then copy the .hoodie folder of
> 7483.zip/repro/.hoodie available in
> https://issues.apache.org/jira/browse/HUDI-7483 to the base path of the meta
> client used by this test. Run the test and it pass.
>
> It means given the compaction plan and the delta commit instant, the conflict
> resolution strategy does its job. The only explanation is the lock does not
> hold the exclusive lock owner invariant, which leads to concurrent execution
> of OCC validation phase.
>
> Given the in process lock provider test dimension has been very stable for a
> long time, we highly suspect the file system lock does not do its work
> depending on the OS/docker container we are using. Disable the test dimension
> to avoid false alarm in java CI
--
This message was sent by Atlassian Jira
(v8.20.10#820010)