prashantwason commented on a change in pull request #2496:
URL: https://github.com/apache/hudi/pull/2496#discussion_r568073678
##########
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
##########
@@ -419,13 +417,8 @@ public static boolean isLogFile(Path logPath) {
* Get the names of all the base and log files in the given partition path.
*/
public static FileStatus[] getAllDataFilesInPartition(FileSystem fs, Path
partitionPath) throws IOException {
- final Set<String> validFileExtensions =
Arrays.stream(HoodieFileFormat.values())
-
.map(HoodieFileFormat::getFileExtension).collect(Collectors.toCollection(HashSet::new));
- final String logFileExtension =
HoodieFileFormat.HOODIE_LOG.getFileExtension();
-
return Arrays.stream(fs.listStatus(partitionPath, path -> {
- String extension = FSUtils.getFileExtension(path.getName());
- return validFileExtensions.contains(extension) ||
path.getName().contains(logFileExtension);
+ return HoodieFileFormat.isBaseFile(path) || isLogFile(path);
})).filter(FileStatus::isFile).toArray(FileStatus[]::new);
}
Review comment:
Not really as one can actually create a directory with a .parquet
suffix.
mkdir /user/pwason/abcd.parquet
hdfs dfs -ls /user/pwason
drwxr-xr-x - pwason hadoop 0 2021-02-01 19:11
/user/pwason/abcd.parquet
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]