refset commented on issue #3756:
URL: https://github.com/apache/hudi/issues/3756#issuecomment-940232664


   Hi @govorunov, I was just taking a look at Hudi myself, so I'm certainly no 
expert, but I think you are looking for "bitemporal" as-of queries where 
`commit time` (aka `transaction time`) and `valid time` are indexed and queried 
independently, e.g. see [this documentation page from 
XTDB](https://xtdb.com/articles/bitemporality.html). Although - full disclosure 
- I work on XTDB and can say that XT's current architecture is not designed to 
handle PBs of data efficiently/cheaply without significant userspace sharding. 
However, we have a new storage & query architecture in the works that could get 
us a lot closer to that PB level...but perhaps still not at the same scale that 
Hudi already operates.
   
   From my very brief look at Hudi's design, I would be surprised if there 
isn't some possible combination to index your own valid time construct using 
derived tables and query with that efficiently (if inelegantly) via Spark / 
Presto etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to