boneanxs commented on code in PR #8452:
URL: https://github.com/apache/hudi/pull/8452#discussion_r1199997268
##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -96,11 +109,32 @@ public List<String>
getPartitionPathWithPathPrefixes(List<String> relativePathPr
}
private List<String> getPartitionPathWithPathPrefix(String
relativePathPrefix) throws IOException {
+ return
getPartitionPathWithPathPrefixUsingFilterExpression(relativePathPrefix, null,
null);
+ }
+
+ private List<String>
getPartitionPathWithPathPrefixUsingFilterExpression(String relativePathPrefix,
+
Types.RecordType partitionFields,
+
Expression expression) throws IOException {
List<Path> pathsToList = new CopyOnWriteArrayList<>();
pathsToList.add(StringUtils.isNullOrEmpty(relativePathPrefix)
- ? new Path(datasetBasePath) : new Path(datasetBasePath,
relativePathPrefix));
+ ? dataBasePath.get() : new Path(dataBasePath.get(),
relativePathPrefix));
List<String> partitionPaths = new CopyOnWriteArrayList<>();
+ int partitionLevel = -1;
+ boolean needPushDownExpressions;
+ // Not like `HoodieBackedTableMetadata`, since we don't know the exact
partition levels here,
+ // given it's possible that partition values contains `/`, which could
affect
+ // the result to get right `partitionValue` when listing paths, here we
have
+ // to make it more strict that `urlEncodePartitioningEnabled` must be
enabled.
+ // TODO better enable urlEncodePartitioningEnabled if
hiveStylePartitioningEnabled is enabled?
Review Comment:
We have to list at least one partition path to get the partition levels for
`FileSystemBackedTableMetadata` to ensure the number of partition columns and
partition levels are same. But it might be time consuming especially for object
store.
So here I have to make it more strict for `FileSystemBackedTableMetadata` to
check `urlEncodePartitioningEnabled` also. While we don't need to check it in
`HoodieTableBackedTableMetadata`, since we can simply get one partition path to
check it.
I'm thinking maybe we should enable `urlEncodePartitioningEnabled` if
`hiveStylePartitioningEnabled` is enabled? Since this is the default behavior
for spark and hive
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]