yui2010 commented on a change in pull request #2378:
URL: https://github.com/apache/hudi/pull/2378#discussion_r570099423



##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -108,7 +111,7 @@ class MergeOnReadSnapshotRelation(val sqlContext: 
SQLContext,
       dataSchema = tableStructSchema,
       partitionSchema = StructType(Nil),

Review comment:
       Hi @garyli1019, very sorry for reply late. I'm very busy these days 
because of the end of the year
   i have submitted a test case testPrunePartitions to demo this.
   

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala
##########
@@ -77,18 +81,26 @@ object HoodieSparkUtils {
    * @return list of absolute file paths
    */
   def checkAndGlobPathIfNecessary(paths: Seq[String], fs: FileSystem): 
Seq[Path] = {
+    val globPaths =
     paths.flatMap(path => {
       val qualified = new Path(path).makeQualified(fs.getUri, 
fs.getWorkingDirectory)
       val globPaths = globPathIfNecessary(fs, qualified)
       globPaths
     })
+    val filteredGlobPaths = globPaths.filterNot( path => 
TablePathUtils.isHoodieMetaPath(path.toString) || shouldFilterOut(path.getName))

Review comment:
       filter the hoodie meta path have two reason:
   1.  if our loadPath like bathPath /\*/\* it will load all 
.hoodie/*.deltacommit file and cause spark do many fs.listStatus . this is 
uneffectively
   2.  load all .hoodie/*.deltacommit file will cause exception about 
discoveredBasePaths.distinct.size == 2 when we use spark listFiles to prunes 
partitions
   

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala
##########
@@ -77,18 +81,26 @@ object HoodieSparkUtils {
    * @return list of absolute file paths
    */
   def checkAndGlobPathIfNecessary(paths: Seq[String], fs: FileSystem): 
Seq[Path] = {
+    val globPaths =
     paths.flatMap(path => {
       val qualified = new Path(path).makeQualified(fs.getUri, 
fs.getWorkingDirectory)
       val globPaths = globPathIfNecessary(fs, qualified)
       globPaths
     })
+    val filteredGlobPaths = globPaths.filterNot( path => 
TablePathUtils.isHoodieMetaPath(path.toString) || shouldFilterOut(path.getName))

Review comment:
       there have two reason for filter the hoodie meta path :
   1.  if our loadPath like bathPath/\*/\* it will load all 
.hoodie/*.deltacommit file and cause spark do many fs.listStatus . this is 
uneffectively
   2.  load all .hoodie/*.deltacommit file will cause exception because 
discoveredBasePaths.distinct.size is 2 when we use spark listFiles to prunes 
partitions
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to