Davis-Zhang-Onehouse commented on code in PR #13489:
URL: https://github.com/apache/hudi/pull/13489#discussion_r2219987846
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##########
@@ -291,21 +310,27 @@ public Option<HoodieIndexMetadata> getIndexMetadata() {
if (indexMetadataOpt.isPresent() &&
!indexMetadataOpt.get().getIndexDefinitions().isEmpty()) {
return indexMetadataOpt;
}
+ Option<HoodieIndexMetadata> indexDefOption = Option.empty();
if (tableConfig.getRelativeIndexDefinitionPath().isPresent() &&
StringUtils.nonEmpty(tableConfig.getRelativeIndexDefinitionPath().get())) {
- StoragePath indexDefinitionPath =
- new StoragePath(basePath,
tableConfig.getRelativeIndexDefinitionPath().get());
- try {
- Option<byte[]> bytesOpt = FileIOUtils.readDataFromPath(storage,
indexDefinitionPath, true);
- if (bytesOpt.isPresent()) {
- return Option.of(HoodieIndexMetadata.fromJson(new
String(bytesOpt.get())));
- } else {
- return Option.of(new HoodieIndexMetadata());
- }
- } catch (IOException e) {
- throw new HoodieIOException("Could not load index definition at path:
" + tableConfig.getRelativeIndexDefinitionPath().get(), e);
+ indexDefOption = loadIndexDefFromStorage(basePath,
tableConfig.getRelativeIndexDefinitionPath().get(), storage);
Review Comment:
> confirm these usages are actually efficient
```
public Option<HoodieIndexMetadata> getIndexMetadata() {
if (indexMetadataOpt.isPresent() &&
!indexMetadataOpt.get().getIndexDefinitions().isEmpty()) {
return indexMetadataOpt;
}
Option<HoodieIndexMetadata> indexDefOption = Option.empty();
if (tableConfig.getRelativeIndexDefinitionPath().isPresent() &&
StringUtils.nonEmpty(tableConfig.getRelativeIndexDefinitionPath().get())) {
indexDefOption = loadIndexDefFromStorage(basePath,
tableConfig.getRelativeIndexDefinitionPath().get(), storage);
}
return indexDefOption;
}
```
upstream logic guards function call with caching and config checks. the
cache is using write through strategy on all code path. I didn't see any issue.
> and from driver
found yet another day 1 issue which I should track as index join blocker
(correct me if it is not a blocker)
https://issues.apache.org/jira/browse/HUDI-9614
It is a very nested code path
hoodieData.mapPartition with index
-> create write handle
-> write handle creation call metaclient related API, including index def.
to fix this I need to think more and align with PR reviewers first. Will
come to them once I started working on it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]