n3nash commented on a change in pull request #2611:
URL: https://github.com/apache/hudi/pull/2611#discussion_r584465066
##########
File path:
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java
##########
@@ -62,6 +67,7 @@
public static final String HOODIE_STOP_AT_COMPACTION_PATTERN =
"hoodie.%s.ro.stop.at.compaction";
public static final String INCREMENTAL_SCAN_MODE = "INCREMENTAL";
public static final String SNAPSHOT_SCAN_MODE = "SNAPSHOT";
+ public static final String VALIDATE_SCAN_MODE = "VALIDATE"; //used for
pre-commit validation
Review comment:
@satishkotha On thinking about this a little deeper, I feel one should
be able to do "validate" in both modes `SNAPSHOT` & `INCREMENTAL`. Essentially,
what you want to do is a `SNAPSHOT @ commitTime` or `Incremental from or @`
which is what time travel allows but ensures that we read only committed data.
To keep concepts this way, you may want to just have a flag saying
`hoodie.%s.consume.uncommitted` whose default value is false, you always fall
back to the `HoodieTableFileSystem` with current behavior, if it's set to true,
then you do what you are currently doing in "VALIDATE" scan mode for snapshot
mode to start with (It's hard for me to reason what incremental validate would
look like). What do you think ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]