[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1348: HUDI-597 Enable incremental pulling from defined partitions

GitBox Sun, 23 Feb 2020 16:33:52 -0800

garyli1019 commented on a change in pull request #1348: HUDI-597 Enable 
incremental pulling from defined partitions
URL: https://github.com/apache/incubator-hudi/pull/1348#discussion_r383055560


 ##########
 File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
 ##########
 @@ -100,17 +101,22 @@ class IncrementalRelation(val sqlContext: SQLContext,
         .get, classOf[HoodieCommitMetadata])
       fileIdToFullPath ++= metadata.getFileIdAndFullPaths(basePath).toMap
     }
+    val pathGlobPattern = 
optParams.getOrElse(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "")
+    val filteredFullPath = if(!pathGlobPattern.equals("")) {
+      val globMatcher = new GlobPattern("*" + pathGlobPattern)
 
 Review comment:
   the path here is a full HDFS path so we need `*` here to match with the 
prefix.  The benefit if we include `*` here is that the user will have a 
consistent interface. When loading the full table, they will do `.load(basePath 
+ "/2016/*/*/*")` and in incremental pulling the `String` the user defined will 
be the same. If we leave the `*` to the user I think it might cause some 
confusion there and the users need to read this part of the code themselves to 
fully understand how things work here.  
   Yea I couldn't find any documents as well. The `GlobFilter` in the API list 
is using `GlobPattern` inside 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java#L67
 and the class is still around 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1348: HUDI-597 Enable incremental pulling from defined partitions

Reply via email to