garyli1019 commented on a change in pull request #1722:
URL: https://github.com/apache/hudi/pull/1722#discussion_r445305619
##########
File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##########
@@ -57,8 +57,7 @@ class DefaultSource extends RelationProvider
if (path.isEmpty) {
throw new HoodieException("'path' must be specified.")
}
-
- if (parameters(QUERY_TYPE_OPT_KEY).equals(QUERY_TYPE_SNAPSHOT_OPT_VAL)) {
+ if
(parameters(QUERY_TYPE_OPT_KEY).equals(QUERY_TYPE_READ_OPTIMIZED_OPT_VAL)) {
// this is just effectively RO view only, where `path` can contain a mix
of
Review comment:
This is hard without a path handler. Now we tell the user to add glob
pattern to `load(basePath + "/*/*/*/*")` for COW table and the glob will be in
the `basePath`. The Spark default `DataSource.apply().resolveRelation()` is
able to handle the glob, but our custom relation not able to handle this. This
is why our incremental relation requires `.load(basePath)` only. Udit's PR has
this path handler so we will have a unified place to handle all the paths.
I think we can change the default option to `READ_OPTIMIZED`, so the user
side has no impact. Currently spark datasource only supports snapshot on COW
anyway, which is the same as READ_OPTIMIZED. We can switch back later with the
path handler
https://github.com/apache/hudi/pull/1702/files#diff-683cf2c70477ed6cc0a484a2ae494999R72
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]