onlywangyh commented on code in PR #7323:
URL: https://github.com/apache/hudi/pull/7323#discussion_r1035603153
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -146,7 +146,7 @@ protected Option<HoodieRecord<HoodieMetadataPayload>>
getRecordByKey(String key,
@Override
public List<String> getPartitionPathsWithPrefixes(List<String> prefixes)
throws IOException {
return getAllPartitionPaths().stream()
- .filter(p -> prefixes.stream().anyMatch(p::startsWith))
+ .filter(p -> prefixes.stream().anyMatch(queryPaths ->
p.startsWith(queryPaths + "/") || queryPaths.equals(p)))
Review Comment:
I had changed the condition try to support more scenarios.
This method will return a match partition path in hudi. When this table has
a partition like [/inc_day=20221120/opcode=501, /inc_day=20221120/opcode=50,
/inc_day=20221120-back/opcode=5000], and the query path prefixes may be a few
of the following possibilities:
1) a empty path like "";
2) a part of partition like "/inc_day=20221120";
3) a absoulty path like "/inc_day=20221120/opcode=50";
If we just use startWith filter match paths, we will reurn a list with
unnecessary partiton paths. Like this:
prefixes="/inc_day=20221120/opcode=50"
matchedPartitionPaths=[/inc_day=20221120/opcode=50,
/inc_day=20221120/opcode=501]
or
prefixes="/inc_day=20221120"
matchedPartitionPaths=[/inc_day=20221120/opcode=50,
/inc_day=20221120/opcode=501, /inc_day=20221120-back/opcode=5000]
While in most of scenarios the matchedPartitionPaths contains unnecessary
partiton paths is right. But in hive will caused a java.lang.RuntimeException:
Invalid input path. So we want make the matchedPartitionPaths exclude these
unnecessary partiton path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]