lukemarles commented on pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#issuecomment-836301088
stop sending me shit
Sent from my iPhone
> On 10 May 2021, at 12:47 pm, pengzhiwei ***@***.***> wrote:
>
>
> @pengzhiwei2018 commented on this pull request.
>
> In
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
>
> > @@ -141,6 +143,31 @@ protected BaseTableMetadata(HoodieEngineContext
engineContext, HoodieMetadataCon
> .getAllFilesInPartition(partitionPath);
> }
>
> + @Override
> + public Map<String, FileStatus[]> getAllFilesInPartitions(List<String>
partitionPaths)
> + throws IOException {
> + if (enabled) {
> + Map<String, FileStatus[]> partitionsFilesMap = new HashMap<>();
> +
> + try {
> + for (String partitionPath : partitionPaths) {
> + partitionsFilesMap.put(partitionPath,
fetchAllFilesInPartition(new Path(partitionPath)));
> + }
> + } catch (Exception e) {
> + if (metadataConfig.enableFallback()) {
> + LOG.error("Failed to retrieve files in partitions from
metadata", e);
> If enable the fallback here, an empty partitionsFilesMap will return if
there is an Exception happened, is it right?
>
> In
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
>
> > @@ -105,6 +106,20 @@ public
FileSystemBackedTableMetadata(HoodieEngineContext engineContext, Serializ
> return partitionPaths;
> }
>
> + @Override
> + public Map<String, FileStatus[]> getAllFilesInPartitions(List<String>
partitionPaths)
> + throws IOException {
> + int parallelism = Math.min(DEFAULT_LISTING_PARALLELISM,
partitionPaths.size());
> If the partitionPaths is empty, the parallelism will be 0, there may be an
Exception ("Positive number of partitions required") throw out for the
sparkContext.parallelize(seq, parallelism),
>
> In
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
>
> > val properties = new Properties()
> + // To support metadata listing via Spark SQL we allow users to pass
the config via Hadoop Conf. Spark SQL does not
> Should we get these configurations from the spark.sessionState.conf for
spark?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub, or unsubscribe.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]