[
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-5919:
----------------------------
Fix Version/s: 0.14.0
(was: 0.13.1)
> Fix the validation of partition listing in metadata table validator
> -------------------------------------------------------------------
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.14.0
>
>
> In HoodieMetadataTableValidator, we compare the partition listing between MDT
> and file system:
> {code:java}
> // ignore partitions created by uncommitted ingestion.
> allPartitionPathsFromFS =
> allPartitionPathsFromFS.stream().parallel().filter(part -> {
> HoodiePartitionMetadata hoodiePartitionMetadata =
> new HoodiePartitionMetadata(metaClient.getFs(),
> FSUtils.getPartitionPath(basePath, part));
> Option<String> instantOption =
> hoodiePartitionMetadata.readPartitionCreatedCommitTime();
> if (instantOption.isPresent()) {
> String instantTime = instantOption.get();
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
> } else {
> return false;
> }
> }).collect(Collectors.toList());
> List<String> allPartitionPathsMeta =
> FSUtils.getAllPartitionPaths(engineContext, basePath, true,
> cfg.assumeDatePartitioning);
> Collections.sort(allPartitionPathsFromFS);
> Collections.sort(allPartitionPathsMeta);
> if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
> || !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
> String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS :
> " + allPartitionPathsFromFS + " and allPartitionPathsMeta : " +
> allPartitionPathsMeta;
> LOG.error(message);
> throw new HoodieValidationException(message);
> } {code}
> When deciding the partitions from the file system to consider for comparison,
> we look at the commit time that creates the partition.
> {code:java}
> if (instantOption.isPresent()) { String instantTime = instantOption.get();
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else
> { return false; } {code}
> In the following scenario, the validation job fires a false alarm complaining
> that the partition list returned by the file system and the metadata table
> because of this check:
> - Commit C1 creates the partition, the partition metadata is written, and C1
> fails during writing data files. Next time, C2 adds new data to the same
> partition after C1 is rolled back. In this case, the partition metadata still
> has C1 as the created commit time, since Hudi does not rewrite the partition
> metadata in C2.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)