govorunov opened a new issue #3756:
URL: https://github.com/apache/hudi/issues/3756
Hi,
I read all the documentation and FAQ and got a feeling Hudi is (almost) the
right tool for what I'm trying to build, still unable to design the right
solution:
We need to build a temporal representation of data stored in some database,
i.e. snapshot of a database table that also stores the history of all changes
to that table and provides means to query table state at different points in
time. Hudi answers almost all the questions:
- we can query 'point in time' using option("as.of.instant",...)
- ability to do incremental queries and query changes for a certain 'time
span' only.
However, it seems like this mechanism is based on '_hoodie_commit_time'
column of the table, which represents the moment in time when data was written
into Hudi table. But in our case, not all changes are happening now - there
are older versions of the database (backups) we need to insert into the
datastore at the proper point in time - months, years old, and be able to query
these versions using 'point in time' queries., as well as see data from these
older versions in the current snapshot. The temporal component in this case is
not 'now', but rather part of the payload itself (DataFrame column or even
option value). Is there a way of bulk-inserting records into Hudi table at
some 'point-in-time' other than 'now'? Ideally, while real-time changes are
also ingested with proper timestamps?
Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]