[ 
https://issues.apache.org/jira/browse/HUDI-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang updated HUDI-8887:
------------------------------
    Description: 
related https://issues.apache.org/jira/browse/HUDI-7483

It is clear that when the test error out with file system LP, the instant file 
has conflict.

It is verified by the following unit test

 
{code:java}
// Copy paste to 
hudi-oss/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java

@Test
public void test22() throws Exception {
  // step1: create a pending replace/commit/compact instant: C1,C11,C12
  HoodieInstant compact = new HoodieInstant(HoodieInstant.State.REQUESTED, 
HoodieTimeline.COMPACTION_ACTION, "20250117104717388", 
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
  HoodieInstant ds = new HoodieInstant(HoodieInstant.State.INFLIGHT, 
HoodieTimeline.DELTA_COMMIT_ACTION, "20250117104722625", 
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
  ConcurrentOperation cc = new ConcurrentOperation(compact, metaClient);
  ConcurrentOperation cds = new ConcurrentOperation(ds, metaClient);
  SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new 
SimpleConcurrentFileWritesConflictResolutionStrategy();
  Assertions.assertTrue(strategy.hasConflict(cc, cds));
} {code}
First stop the test at the first line, then copy the .hoodie folder of 
7483.zip/repro/.hoodie available in 
https://issues.apache.org/jira/browse/HUDI-7483 to the base path of the meta 
client used by this test. Run the test and it pass.

 

It means given the compaction plan and the delta commit instant, the conflict 
resolution strategy does its job. The only explanation is the lock does not 
hold the exclusive lock owner invariant, which leads to concurrent execution of 
OCC validation phase.

 

Given the in process lock provider test dimension has been very stable for a 
long time, we highly suspect the file system lock does not do its work 
depending on the OS/docker container we are using. Disable the test dimension 
to avoid false alarm in java CI

  was:
related https://issues.apache.org/jira/browse/HUDI-7483

It is clear that when the test error out with file system LP, the instant file 
has conflict.

It is verified by the following unit test

 
{code:java}
// Copy paste to 
hudi-oss/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java

@Test
public void test22() throws Exception {
  // step1: create a pending replace/commit/compact instant: C1,C11,C12
  HoodieInstant compact = new HoodieInstant(HoodieInstant.State.REQUESTED, 
HoodieTimeline.COMPACTION_ACTION, "20250117104717388", 
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
  HoodieInstant ds = new HoodieInstant(HoodieInstant.State.INFLIGHT, 
HoodieTimeline.DELTA_COMMIT_ACTION, "20250117104722625", 
InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
  ConcurrentOperation cc = new ConcurrentOperation(compact, metaClient);
  ConcurrentOperation cds = new ConcurrentOperation(ds, metaClient);
  SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new 
SimpleConcurrentFileWritesConflictResolutionStrategy();
  Assertions.assertTrue(strategy.hasConflict(cc, cds));
} {code}
First stop the test at the first line, then copy the .hoodie folder of 
7483.zip/repro/.hoodie to the base path of the meta client used by this test. 
Run the test and it pass.

 

It means given the compaction plan and the delta commit instant, the conflict 
resolution strategy does its job. The only explanation is the lock does not 
hold the exclusive lock owner invariant, which leads to concurrent execution of 
OCC validation phase.

 

Given the in process lock provider test dimension has been very stable for a 
long time, we highly suspect the file system lock does not do its work 
depending on the OS/docker container we are using. Disable the test dimension 
to avoid false alarm in java CI


> FileSystem lock provider can have more than 1 threads hold the lock at the 
> same time
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-8887
>                 URL: https://issues.apache.org/jira/browse/HUDI-8887
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Davis Zhang
>            Priority: Major
>
> related https://issues.apache.org/jira/browse/HUDI-7483
> It is clear that when the test error out with file system LP, the instant 
> file has conflict.
> It is verified by the following unit test
>  
> {code:java}
> // Copy paste to 
> hudi-oss/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestSimpleConcurrentFileWritesConflictResolutionStrategy.java
> @Test
> public void test22() throws Exception {
>   // step1: create a pending replace/commit/compact instant: C1,C11,C12
>   HoodieInstant compact = new HoodieInstant(HoodieInstant.State.REQUESTED, 
> HoodieTimeline.COMPACTION_ACTION, "20250117104717388", 
> InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
>   HoodieInstant ds = new HoodieInstant(HoodieInstant.State.INFLIGHT, 
> HoodieTimeline.DELTA_COMMIT_ACTION, "20250117104722625", 
> InstantComparatorV2.REQUESTED_TIME_BASED_COMPARATOR);
>   ConcurrentOperation cc = new ConcurrentOperation(compact, metaClient);
>   ConcurrentOperation cds = new ConcurrentOperation(ds, metaClient);
>   SimpleConcurrentFileWritesConflictResolutionStrategy strategy = new 
> SimpleConcurrentFileWritesConflictResolutionStrategy();
>   Assertions.assertTrue(strategy.hasConflict(cc, cds));
> } {code}
> First stop the test at the first line, then copy the .hoodie folder of 
> 7483.zip/repro/.hoodie available in 
> https://issues.apache.org/jira/browse/HUDI-7483 to the base path of the meta 
> client used by this test. Run the test and it pass.
>  
> It means given the compaction plan and the delta commit instant, the conflict 
> resolution strategy does its job. The only explanation is the lock does not 
> hold the exclusive lock owner invariant, which leads to concurrent execution 
> of OCC validation phase.
>  
> Given the in process lock provider test dimension has been very stable for a 
> long time, we highly suspect the file system lock does not do its work 
> depending on the OS/docker container we are using. Disable the test dimension 
> to avoid false alarm in java CI



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to