Ethan Guo created HUDI-3637:
-------------------------------
Summary: Check file listing from FS vs metadata table when
compaction in pending and inflight
Key: HUDI-3637
URL: https://issues.apache.org/jira/browse/HUDI-3637
Project: Apache Hudi
Issue Type: Task
Reporter: Ethan Guo
HoodieMetadataTableValidator validation of the latest base files and file
slices fails due to the following. The validation failure may be due to the
inflight compaction. Need to investigate whether this affects the file listing
for write operations. The behavior is that after some instants, the validation
can pass, so the MT correct is guaranteed, but the file listing view may have a
bug.
{code:java}
file slices from metadata: [FileSlice
{fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28',
fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'},
baseCommitTime=20220314001058266,
baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
fileLen=106839698, BootstrapBaseFile=null}', logFiles='[]'}]
file slices from file system and base files: [FileSlice
{fileGroupId=HoodieFileGroupId{partitionPath='2022/1/28',
fileId='769bf7ac-d6d0-452c-bf54-bbe7e8381766-0'},
baseCommitTime=20220314001058266,
baseFile='HoodieBaseFile{fullPath=file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet,
fileLen=106839698, BootstrapBaseFile=null}',
logFiles='[HoodieLogFile{pathStr='file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_c_multi_writer/c2_mor_010nomt_011mt/test_table/2022/1/28/.769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_20220314001058266.log.1_2-111-954',
fileLen=51607682}]'}]
22/03/14 00:33:03 ERROR HoodieMetadataTableValidator: Metadata table validation
failed for 2022/1/28 due to HoodieValidationException {code}
Compaction:
{code:java}
Partition Path │ FileId │ Base-Instant │
Data File Path │
Total Delta Files │ getMetrics
║
╠══
2022/1/28 │ 769bf7ac-d6d0-452c-bf54-bbe7e8381766-0 │ 20220314001058266 │
769bf7ac-d6d0-452c-bf54-bbe7e8381766-0_2-47-485_20220314001058266.parquet │ 1
│ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=151.0,
TOTAL_LOG_FILES_SIZE=5.1607682E7, TOTAL_IO_WRITE_MB=101.0, TOTAL_IO_MB=252.0} ║
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)