govorunov opened a new issue #3756:
URL: https://github.com/apache/hudi/issues/3756


   Hi,
   
   I read all the documentation and FAQ and got a feeling Hudi is (almost) the 
right tool for what I'm trying to build, still unable to design the right 
solution:
   
   We need to build a  temporal representation of data stored in some database, 
i.e. snapshot of a database table that also stores the history of all changes 
to that table and provides means to query table state at different points in 
time.  Hudi answers almost all the questions:
   
   -  we can query 'point in time' using option("as.of.instant",...)
   -  ability to do incremental queries and query changes for a certain 'time 
span' only.
   
   However, it seems like this mechanism is based on '_hoodie_commit_time' 
column of the table, which represents the moment in time when data was written 
into Hudi table.  But in our case, not all changes are happening now - there 
are older versions of the database (backups) we need to insert into the 
datastore at the proper point in time - months, years old, and be able to query 
these versions using 'point in time' queries., as well as see data from these 
older versions in the current snapshot.  The temporal component in this case is 
not 'now', but rather part of the payload itself (DataFrame column or even 
option value).  Is there a way of bulk-inserting records into Hudi table at 
some 'point-in-time' other than 'now'? Ideally, while real-time changes are 
also ingested with proper timestamps?
   
   Thank you!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to