garyli1019 commented on a change in pull request #1348: HUDI-597 Enable
incremental pulling from defined partitions
URL: https://github.com/apache/incubator-hudi/pull/1348#discussion_r383055560
##########
File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##########
@@ -100,17 +101,22 @@ class IncrementalRelation(val sqlContext: SQLContext,
.get, classOf[HoodieCommitMetadata])
fileIdToFullPath ++= metadata.getFileIdAndFullPaths(basePath).toMap
}
+ val pathGlobPattern =
optParams.getOrElse(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "")
+ val filteredFullPath = if(!pathGlobPattern.equals("")) {
+ val globMatcher = new GlobPattern("*" + pathGlobPattern)
Review comment:
the path here is a full HDFS path so we need `*` here to match with the
prefix. The benefit if we include `*` here is that the user will have a
consistent interface. When loading the full table, they will do `.load(basePath
+ "/2016/*/*/*")` and in incremental pulling the `String` the user defined will
be the same. If we leave the `*` to the user I think it might cause some
confusion there and the users need to read this part of the code themselves to
fully understand how things work here.
Yea I couldn't find any documents as well. The `GlobFilter` in the API list
is using `GlobPattern` inside
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobFilter.java#L67
and the class is still around
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobPattern.java
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services