n3nash commented on issue #2201:
URL: https://github.com/apache/hudi/issues/2201#issuecomment-716615631


   @getniz You can perform time-travel queries using _hoodie_commit_time and 
achieve your use case of Q1 inventory vs Q2 inventory. It would look something 
like this -> 
   1) ingest_data_for_Q1 with some commit_time
   2) ingest_data_for_Q2 with some new commit_time
   3) Query against registered tables with begin & end commit instants to limit 
the data against the commit time
   
   Using Spark
   
   ```
   Dataset<Row> hudiIncQueryDF = spark.read()
        .format("org.apache.hudi")
        .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
        .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), 
<beginInstantTime>)
        .option(DataSourceReadOptions.END_INSTANTTIME_OPT_KEY(), 
<endInstantTime>)
        .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(), 
"/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain 
partitions
        .load(tablePath); // For incremental query, pass in the root/base path 
of table
        
   hudiIncQueryDF.createOrReplaceTempView("hudi_trips_incremental")
   spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from 
 hudi_trips_incremental where fare > 20.0").show()
   ```
   Using Hive
   
   ```
   hive_shell> set hoodie.source_table_name.consume.mode=incremental
   hive_shell> set hoodie.table_name.consume.start.timestamp=<beginInstantTime>
   convert_endInstantTime_to_num_commits_to_read=5
   hive_shell> set hoodie.table_name.consume.max.commits=5
   hive_shell> select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts 
from  source_table_name where fare > 20.0
   ```
   
   Now, if you have deletes, currently HUDI will only apply those changes to 
the latest "view" of this data. So, it will look like this
   4) ingest_data_for_deletes with some new commit_time
   Now, using this new commit_time you can time travel to Q1 or Q2 or 
(Q1+Q2+deletes). 
   
   Is that what you're looking for ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to