pengzhiwei2018 commented on a change in pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#discussion_r629013054
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java
##########
@@ -141,6 +143,31 @@ protected BaseTableMetadata(HoodieEngineContext
engineContext, HoodieMetadataCon
.getAllFilesInPartition(partitionPath);
}
+ @Override
+ public Map<String, FileStatus[]> getAllFilesInPartitions(List<String>
partitionPaths)
+ throws IOException {
+ if (enabled) {
+ Map<String, FileStatus[]> partitionsFilesMap = new HashMap<>();
+
+ try {
+ for (String partitionPath : partitionPaths) {
+ partitionsFilesMap.put(partitionPath, fetchAllFilesInPartition(new
Path(partitionPath)));
+ }
+ } catch (Exception e) {
+ if (metadataConfig.enableFallback()) {
+ LOG.error("Failed to retrieve files in partitions from metadata", e);
Review comment:
If enable the fallback here, an empty `partitionsFilesMap` will return
if there is an Exception happened, is it right?
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java
##########
@@ -105,6 +106,20 @@ public FileSystemBackedTableMetadata(HoodieEngineContext
engineContext, Serializ
return partitionPaths;
}
+ @Override
+ public Map<String, FileStatus[]> getAllFilesInPartitions(List<String>
partitionPaths)
+ throws IOException {
+ int parallelism = Math.min(DEFAULT_LISTING_PARALLELISM,
partitionPaths.size());
Review comment:
If the `partitionPaths` is empty, the `parallelism` will be 0, there may
be an Exception ("Positive number of partitions required") throw out for the
`sparkContext.parallelize(seq, parallelism)`,
##########
File path:
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala
##########
@@ -227,7 +227,17 @@ case class HoodieFileIndex(
*/
private def loadPartitionPathFiles(): Map[PartitionRowPath,
Array[FileStatus]] = {
val sparkEngine = new HoodieSparkEngineContext(new
JavaSparkContext(spark.sparkContext))
+ val serializableConf = new
SerializableConfiguration(spark.sessionState.newHadoopConf())
+
val properties = new Properties()
+ // To support metadata listing via Spark SQL we allow users to pass the
config via Hadoop Conf. Spark SQL does not
Review comment:
Should we get these `configurations` from the `spark.sessionState.conf`
for spark?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]