[ 
https://issues.apache.org/jira/browse/HUDI-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517227#comment-17517227
 ] 

Ethan Guo commented on HUDI-3637:
---------------------------------

After revisiting the relevant logic, the compaction and clustering logic is 
correct using getLatestFileSlices().  The mismatch in this case should not 
cause any correctness issue and should be handled at the validation layer.

At high level, getLatestFileSlices() is going to fetch the latest file slices 
for committed base files and filter out any file slices with the uncommitted 
base instant time.  The uncommitted log files in the latest file slices may be 
included, and they are skipped while doing log reading and merging, i.e., the 
logic in "AbstractHoodieLogRecordReader":
{code:java}
if (logBlock.getBlockType() != CORRUPT_BLOCK && logBlock.getBlockType() != 
COMMAND_BLOCK) {
  if (!completedInstantsTimeline.containsOrBeforeTimelineStarts(instantTime)
      || inflightInstantsTimeline.containsInstant(instantTime)) {
    // hit an uncommitted block possibly from a failed write, move to the next 
one and skip processing this one
    continue;
  }
  if (instantRange.isPresent() && !instantRange.get().isInRange(instantTime)) {
    // filter the log block by instant range
    continue;
  }
} {code}
At the concurrency control layer, when two concurrent commits trying to touch 
the same file group, one of them is going to fail to guarantee correctness.  
Take the following three cases as examples:

> Case 1
{code:java}
writer 1: DC1 (inflight) lf1 added                                          -> 
about to commit, conflict resolution  DC1 fails
writer 2:                           schedule compaction (include bf1 lf1){code}
Writer 1 starts deltacommit (DC1) and it's inflight.  log file 1 is written.  
After that, writer 2 schedules compaction so it includes base file 1 and 
corresponding log file 1.
When DC1 is about to commit later on, the conflict resolution detects that it 
touches the same file group as the compaction does, so DC1 fails. 

> Case 2
{code:java}
writer 1: DC1 (inflight) lf1 added                                      -> 
about to commit, conflict resolution  DC1 fails .  DC1 is rolled back
writer 2:                         schedule compaction (include bf1 lf1)         
                                                                execution{code}
Writer 1 starts deltacommit (DC1) and it's inflight.  log file 1 is written.  
After that, writer 2 schedules compaction so it includes base file 1 and 
corresponding log file 1. When DC1 is about to commit later on, the conflict 
resolution detects that it touches the same file group as the compaction does, 
so DC1 fails.  DC1 is then rolled back with a rollback command block added to 
the file group.  Now DC1 does not exist in the timeline.  Later on when 
compaction is executed, log file 1 is still excluded based on the if condition 
above.

> Case 3
{code:java}
writer 1: DC1 (inflight) lf1 added                                              
                   -> about to commit, conflict resolution DC1 fails
writer 2:                      schedule compaction (include bf1 lf1) commit 
(excluding lf1) succeeds{code}
Writer 1 starts deltacommit (DC1) and it's inflight.  log file 1 is written.  
After that, writer 2 schedules compaction so it includes base file 1 and 
corresponding log file 1.
When executing compaction, log file 1 is excluded because the instant time 
inside the log block has DC1 and it's still inflight.  When DC1 is about to 
commit later on, the conflict resolution detects that it touches the same file 
group as the compaction does, so DC1 fails. 

> Check file listing from FS vs metadata table when compaction in pending and 
> inflight
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-3637
>                 URL: https://issues.apache.org/jira/browse/HUDI-3637
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> HoodieMetadataTableValidator validation of the latest base files and file 
> slices fails due to the following (from MT, log files are missing, compared 
> to FS view).  The validation failure may be due to the inflight compaction.  
> Need to investigate whether this affects the file listing for write 
> operations.  The behavior is that after some instants, the validation can 
> pass, so the MT correct is guaranteed, but the file listing view may have a 
> bug.
> {code:java}
> file slices from metadata: [FileSlice 
> {fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28', 
> fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'}, 
> baseCommitTime=20220314001058266, 
> baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
>  fileLen=106839698, BootstrapBaseFile=null}', logFiles='[]'}]
> file slices from file system and base files: [FileSlice 
> {fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28', 
> fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'}, 
> baseCommitTime=20220314001058266, 
> baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
>  fileLen=106839698, BootstrapBaseFile=null}', 
> logFiles='[HoodieLogFile{pathStr='file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/.769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_20220314001058266.log.1_2-111-954',
>  fileLen=51607682}]'}]
> 22/03/14 00:33:03 ERROR HoodieMetadataTableValidator: Metadata table 
> validation failed for 2022/1/28 due to HoodieValidationException {code}
> Compaction:
> {code:java}
> Partition Path │ FileId                                 │ Base-Instant      │ 
> Data File Path                                                            │ 
> Total Delta Files │ getMetrics                                                
>                                                                   ║
> ╠══
>  2022/1/28      │ 769bf7ac-d6d0-452c-bf54-bbe7e8381766-0 │ 20220314001058266 
> │ 769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet │ 
> 1                 │ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=151.0, 
> TOTAL_LOG_FILES_SIZE=5.1607682E7, TOTAL_IO_WRITE_MB=101.0, TOTAL_IO_MB=252.0} 
> ║ {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to