n3nash commented on issue #2201:
URL: https://github.com/apache/hudi/issues/2201#issuecomment-716615631
@getniz You can perform time-travel queries using _hoodie_commit_time and
achieve your use case of Q1 inventory vs Q2 inventory. It would look something
like this ->
1) ingest_data_for_Q1 with some commit_time
2) ingest_data_for_Q2 with some new commit_time
3) Query against registered tables with begin & end commit instants to limit
the data against the commit time
Using Spark
```
Dataset<Row> hudiIncQueryDF = spark.read()
.format("org.apache.hudi")
.option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(),
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
<beginInstantTime>)
.option(DataSourceReadOptions.END_INSTANTTIME_OPT_KEY(),
<endInstantTime>)
.option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(),
"/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain
partitions
.load(tablePath); // For incremental query, pass in the root/base path
of table
hudiIncQueryDF.createOrReplaceTempView("hudi_trips_incremental")
spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from
hudi_trips_incremental where fare > 20.0").show()
```
Using Hive
```
hive_shell> set hoodie.source_table_name.consume.mode=incremental
hive_shell> set hoodie.table_name.consume.start.timestamp=<beginInstantTime>
convert_endInstantTime_to_num_commits_to_read=5
hive_shell> set hoodie.table_name.consume.max.commits=5
hive_shell> select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts
from source_table_name where fare > 20.0
```
Now, if you have deletes, currently HUDI will only apply those changes to
the latest "view" of this data. So, it will look like this
4) ingest_data_for_deletes with some new commit_time
Now, using this new commit_time you can time travel to Q1 or Q2 or
(Q1+Q2+deletes).
Is that what you're looking for ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]