sivabalan narayanan created HUDI-5588:
-----------------------------------------
Summary: Fix Metadata table validator to deduce valid partitions
when first commit where partition was added is failed
Key: HUDI-5588
URL: https://issues.apache.org/jira/browse/HUDI-5588
Project: Apache Hudi
Issue Type: Bug
Components: tests-ci
Reporter: sivabalan narayanan
Metadata validation sometimes fails due to test code issue.
FS based listing shows 0 partitions, while MDT listing shows all 100
partitions. Its an issue w/ validator code.
actual timeline:
ls -ltr tbl1/hoodie_table/.hoodie/ total 720 drwxr-xr-x 2 nsb staff 64 Jan 17
18:45 archived drwxr-xr-x 4 nsb staff 128 Jan 17 18:45 metadata -rw-r--r-- 1
nsb staff 808 Jan 17 18:45 hoodie.properties -rw-r--r-- 1 nsb staff 1230 Jan 17
18:45 20230117214546000.rollback.requested -rw-r--r-- 1 nsb staff 0 Jan 17
18:45 20230117214546000.rollback.inflight -rw-r--r-- 1 nsb staff 1414 Jan 17
18:46 20230117214546000.rollback -rw-r--r-- 1 nsb staff 1230 Jan 17 18:47
20230117214701512.rollback.requested -rw-r--r-- 1 nsb staff 0 Jan 17 18:47
20230117214701512.rollback.inflight -rw-r--r-- 1 nsb staff 1414 Jan 17 18:47
20230117214701512.rollback -rw-r--r-- 1 nsb staff 15492 Jan 17 18:48
20230117214831503.rollback.requested -rw-r--r-- 1 nsb staff 0 Jan 17 18:48
20230117214831503.rollback.inflight -rw-r--r-- 1 nsb staff 0 Jan 17 18:48
20230117214848714.deltacommit.requested -rw-r--r-- 1 nsb staff 16359 Jan 17
18:48 20230117214831503.rollback -rw-r--r-- 1 nsb staff 69698 Jan 17 18:49
20230117214848714.deltacommit.inflight -rw-r--r-- 1 nsb staff 0 Jan 17 18:50
20230117215006714.deltacommit.requested -rw-r--r-- 1 nsb staff 94423 Jan 17
18:50 20230117214848714.deltacommit -rw-r--r-- 1 nsb staff 142198 Jan 17 18:50
20230117215006714.deltacommit.inflight
atleast there is one successfull commit 20230117214848714.deltacommit.
but our validator code checks for creation time of partition and considers that
as valid partition only if that particular commit is succeded.
{code:java}
List<String> allPartitionPathsFromFS =
FSUtils.getAllPartitionPaths(engineContext, basePath, false,
cfg.assumeDatePartitioning);
HoodieTimeline completedTimeline =
metaClient.getActiveTimeline().filterCompletedInstants();
// ignore partitions created by uncommitted ingestion.
allPartitionPathsFromFS =
allPartitionPathsFromFS.stream().parallel().filter(part -> {
HoodiePartitionMetadata hoodiePartitionMetadata =
new HoodiePartitionMetadata(metaClient.getFs(),
FSUtils.getPartitionPath(basePath, part));
Option<String> instantOption =
hoodiePartitionMetadata.readPartitionCreatedCommitTime();
if (instantOption.isPresent()) {
String instantTime = instantOption.get();
return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
} else {
return false;
}
}).collect(Collectors.toList()); {code}
we need to fix this
--
This message was sent by Atlassian Jira
(v8.20.10#820010)