TengHuo commented on code in PR #8503:
URL: https://github.com/apache/hudi/pull/8503#discussion_r1411860626
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java:
##########
@@ -189,29 +195,64 @@ public HoodieConsistentHashingMetadata
loadOrCreateMetadata(HoodieTable table, S
*/
public static Option<HoodieConsistentHashingMetadata>
loadMetadata(HoodieTable table, String partition) {
Path metadataPath =
FSUtils.getPartitionPath(table.getMetaClient().getHashingMetadataPath(),
partition);
-
+ Path partitionPath =
FSUtils.getPartitionPath(table.getMetaClient().getBasePathV2(), partition);
Review Comment:
Is it a typo here? `partitionPath` is data partition path now, e.g.
`hdfs://.../hudi_table/date=2023-12-01/`, which does not match with logic in
`hashingMetaCommitFilePredicate`. Correct me if I'm wrong, did i miss anything?
According to the code below, `partitionPath` will be used to create a marker
file in method `createCommitMarker`. If `partitionPath` is the path of data
files, it will create a marker file in the partition folder, e.g.
`hdfs://.../hudi_table/date=2023-12-01/00000000000000.commit`.
However, the code below is looking for the committed metadata files in the
folder `metadataPath`.
```java
final FileStatus[] metaFiles =
metaClient.getFs().listStatus(metadataPath);
final TreeSet<String> commitMetaTss =
Arrays.stream(metaFiles).filter(hashingMetaCommitFilePredicate)
.map(commitFile ->
HoodieConsistentHashingMetadata.getTimestampFromFile(commitFile.getPath().getName()))
.sorted()
.collect(Collectors.toCollection(TreeSet::new));
```
And in the test case,
`TestSparkConsistentBucketClustering.testLoadMetadata`, it also looks for the
committed metadata files in the folder `metadataPath`
```
Path metadataPath =
FSUtils.getPartitionPath(table.getMetaClient().getHashingMetadataPath(), p);
try {
Arrays.stream(table.getMetaClient().getFs().listStatus(metadataPath)).forEach(fl
-> {
if
(fl.getPath().getName().contains(HoodieConsistentHashingMetadata.HASHING_METADATA_COMMIT_FILE_SUFFIX))
{
try {
// delete commit marker to test recovery job
table.getMetaClient().getFs().delete(fl.getPath());
} catch (IOException e) {
throw new RuntimeException(e);
}
}
});
```
BTW, the code has been refactored in a new class,
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIndexUtils.java#L106
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]