garyli1019 commented on a change in pull request #1722:
URL: https://github.com/apache/hudi/pull/1722#discussion_r445305619



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##########
@@ -57,8 +57,7 @@ class DefaultSource extends RelationProvider
     if (path.isEmpty) {
       throw new HoodieException("'path' must be specified.")
     }
-
-    if (parameters(QUERY_TYPE_OPT_KEY).equals(QUERY_TYPE_SNAPSHOT_OPT_VAL)) {
+        if 
(parameters(QUERY_TYPE_OPT_KEY).equals(QUERY_TYPE_READ_OPTIMIZED_OPT_VAL)) {
       // this is just effectively RO view only, where `path` can contain a mix 
of

Review comment:
       This is hard without a path handler. Now we tell the user to add glob 
pattern to `load(basePath + "/*/*/*/*")` for COW table and the glob will be in 
the `basePath`. The Spark default `DataSource.apply().resolveRelation()` is 
able to handle the glob, but our custom relation not able to handle this. This 
is why our incremental relation requires `.load(basePath)` only. Udit's PR has 
this path handler so we will have a unified place to handle all the paths. 
   I think we can change the default option to `READ_OPTIMIZED`, so the user 
side has no impact. Currently spark datasource only supports snapshot on COW 
anyway, which is the same as READ_OPTIMIZED. We can switch back later with the 
path handler 
https://github.com/apache/hudi/pull/1702/files#diff-683cf2c70477ed6cc0a484a2ae494999R72




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to