[GitHub] [hudi] yui2010 commented on a change in pull request #2378: [HUDI-1491] Support partition pruning for MOR snapshot query

GitBox Thu, 04 Feb 2021 08:20:08 -0800


yui2010 commented on a change in pull request #2378:
URL: https://github.com/apache/hudi/pull/2378#discussion_r570113203




##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala
##########
@@ -77,18 +81,26 @@ object HoodieSparkUtils {
    * @return list of absolute file paths
    */
   def checkAndGlobPathIfNecessary(paths: Seq[String], fs: FileSystem): 
Seq[Path] = {
+    val globPaths =
     paths.flatMap(path => {
       val qualified = new Path(path).makeQualified(fs.getUri, 
fs.getWorkingDirectory)
       val globPaths = globPathIfNecessary(fs, qualified)
       globPaths
     })
+    val filteredGlobPaths = globPaths.filterNot( path => 
TablePathUtils.isHoodieMetaPath(path.toString) || shouldFilterOut(path.getName))

Review comment:
       there have two reason for filter the hoodie meta path :
   1.  if our loadPath like bathPath/\*/\* it will load all 
.hoodie/*.deltacommit file and cause spark do many fs.listStatus . this is 
uneffectively
   2.  load all .hoodie/*.deltacommit file will cause exception because 
discoveredBasePaths.distinct.size is 2 when we use spark listFiles to prunes 
partitions
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yui2010 commented on a change in pull request #2378: [HUDI-1491] Support partition pruning for MOR snapshot query

Reply via email to