[
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-5919:
----------------------------
Description:
In HoodieMetadataTableValidator, we compare the partition listing between MDT
and file system:
{code:java}
// ignore partitions created by uncommitted ingestion.
allPartitionPathsFromFS =
allPartitionPathsFromFS.stream().parallel().filter(part -> {
HoodiePartitionMetadata hoodiePartitionMetadata =
new HoodiePartitionMetadata(metaClient.getFs(),
FSUtils.getPartitionPath(basePath, part));
Option<String> instantOption =
hoodiePartitionMetadata.readPartitionCreatedCommitTime();
if (instantOption.isPresent()) {
String instantTime = instantOption.get();
return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
} else {
return false;
}
}).collect(Collectors.toList());
List<String> allPartitionPathsMeta =
FSUtils.getAllPartitionPaths(engineContext, basePath, true,
cfg.assumeDatePartitioning);
Collections.sort(allPartitionPathsFromFS);
Collections.sort(allPartitionPathsMeta);
if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
|| !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : "
+ allPartitionPathsFromFS + " and allPartitionPathsMeta : " +
allPartitionPathsMeta;
LOG.error(message);
throw new HoodieValidationException(message);
} {code}
When deciding the partitions from the file system to consider for comparison,
we look at the commit time that creates the partition.
{code:java}
if (instantOption.isPresent()) { String instantTime = instantOption.get();
return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else {
return false; } {code}
There is one case that this can fire false alarm. Consider the following case.
> Fix the validation of partition listing in metadata table validator
> -------------------------------------------------------------------
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Blocker
> Fix For: 0.13.1
>
>
> In HoodieMetadataTableValidator, we compare the partition listing between MDT
> and file system:
> {code:java}
> // ignore partitions created by uncommitted ingestion.
> allPartitionPathsFromFS =
> allPartitionPathsFromFS.stream().parallel().filter(part -> {
> HoodiePartitionMetadata hoodiePartitionMetadata =
> new HoodiePartitionMetadata(metaClient.getFs(),
> FSUtils.getPartitionPath(basePath, part));
> Option<String> instantOption =
> hoodiePartitionMetadata.readPartitionCreatedCommitTime();
> if (instantOption.isPresent()) {
> String instantTime = instantOption.get();
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
> } else {
> return false;
> }
> }).collect(Collectors.toList());
> List<String> allPartitionPathsMeta =
> FSUtils.getAllPartitionPaths(engineContext, basePath, true,
> cfg.assumeDatePartitioning);
> Collections.sort(allPartitionPathsFromFS);
> Collections.sort(allPartitionPathsMeta);
> if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
> || !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
> String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS :
> " + allPartitionPathsFromFS + " and allPartitionPathsMeta : " +
> allPartitionPathsMeta;
> LOG.error(message);
> throw new HoodieValidationException(message);
> } {code}
> When deciding the partitions from the file system to consider for comparison,
> we look at the commit time that creates the partition.
> {code:java}
> if (instantOption.isPresent()) { String instantTime = instantOption.get();
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else
> { return false; } {code}
> There is one case that this can fire false alarm. Consider the following
> case.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)