govorunov commented on issue #3756:
URL: https://github.com/apache/hudi/issues/3756#issuecomment-937267572


   I think I need to elaborate a little further:
   
   1. If we are to write all database backups into Hudi table in their 
historical order, then do the live database snapshot and only then start 
consuming new changes, then all the events will be written into Hudi table in 
their proper chronological order, although useless as all the dates will be off 
- events will appear by the time they were written into Hudi table and not the 
time of the event itself.
   2.  If we are to partition Hudi table by the date of event, then we are able 
to query time ranges properly, but then we are simply getting all the events. 
To do a 'point in time' query we'd have to query all historical data and then 
combine duplicate events by their 'event time'. It is possible although slow 
and what is the reason for using Hudi at all as we can do the same with bare 
parquet.
   
   If I am asking for a use case Hudi was not intended to handle, can someone 
maybe suggest the right tool for me, because I've been looking into temporal 
databases for quite some time already and still cannot find a solution capable 
to organize and query data in historical order and capable of storing large 
volumes of data (petabytes of it)?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to