[GitHub] [hudi] lukemarles commented on pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

GitBox Mon, 10 May 2021 00:34:46 -0700


lukemarles commented on pull request #2893:
URL: https://github.com/apache/hudi/pull/2893#issuecomment-836301088



   stop sending me shit 
   
   Sent from my iPhone
   
   > On 10 May 2021, at 12:47 pm, pengzhiwei ***@***.***> wrote:
   > 
   > 
   > @pengzhiwei2018 commented on this pull request.
   > 
   > In 
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
   > 
   > > @@ -141,6 +143,31 @@ protected BaseTableMetadata(HoodieEngineContext 
engineContext, HoodieMetadataCon
   >          .getAllFilesInPartition(partitionPath);
   >    }
   >  
   > +  @Override
   > +  public Map<String, FileStatus[]> getAllFilesInPartitions(List<String> 
partitionPaths)
   > +      throws IOException {
   > +    if (enabled) {
   > +      Map<String, FileStatus[]> partitionsFilesMap = new HashMap<>();
   > +
   > +      try {
   > +        for (String partitionPath : partitionPaths) {
   > +          partitionsFilesMap.put(partitionPath, 
fetchAllFilesInPartition(new Path(partitionPath)));
   > +        }
   > +      } catch (Exception e) {
   > +        if (metadataConfig.enableFallback()) {
   > +          LOG.error("Failed to retrieve files in partitions from 
metadata", e);
   > If enable the fallback here, an empty partitionsFilesMap will return if 
there is an Exception happened, is it right?
   > 
   > In 
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
   > 
   > > @@ -105,6 +106,20 @@ public 
FileSystemBackedTableMetadata(HoodieEngineContext engineContext, Serializ
   >      return partitionPaths;
   >    }
   >  
   > +  @Override
   > +  public Map<String, FileStatus[]> getAllFilesInPartitions(List<String> 
partitionPaths)
   > +      throws IOException {
   > +    int parallelism = Math.min(DEFAULT_LISTING_PARALLELISM, 
partitionPaths.size());
   > If the partitionPaths is empty, the parallelism will be 0, there may be an 
Exception ("Positive number of partitions required") throw out for the 
sparkContext.parallelize(seq, parallelism),
   > 
   > In 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
   > 
   > >      val properties = new Properties()
   > +    // To support metadata listing via Spark SQL we allow users to pass 
the config via Hadoop Conf. Spark SQL does not
   > Should we get these configurations from the spark.sessionState.conf for 
spark?
   > 
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] lukemarles commented on pull request #2893: [HUDI-1371] Support metadata based listing for Spark DataSource and Spark SQL

Reply via email to