umehrot2 commented on a change in pull request #2651:
URL: https://github.com/apache/hudi/pull/2651#discussion_r592005038



##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##########
@@ -66,10 +69,15 @@ class DefaultSource extends RelationProvider
   override def createRelation(sqlContext: SQLContext,
                               optParams: Map[String, String],
                               schema: StructType): BaseRelation = {
+    // Remove the "*" from the path in order to be compatible with the 
previous query path with "*"
+    val path = removeStar(optParams.get("path"))

Review comment:
       I would ideally not like to go with this assumption. This would break 
and return incorrect results for Hudi customers using globbed paths till now. 
If possible, we should try to implement it in a way that supports both globbed 
paths and non-globbed paths.
   
   Spark's `InMemoryFileIndex` for example can handle both globbed and 
non-globbed paths, and if we are implementing our own FileIndex then we should 
see if we can handle that in our implementation too. I need to take a deeper 
look, but is it not possible to glob the paths and pass all the paths to the 
`HoodieFileIndex` and then list all of them like `InMemoryFileIndex` does. It 
accepts multiple `rootPathsSpecified`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to