refset commented on issue #3756: URL: https://github.com/apache/hudi/issues/3756#issuecomment-940232664
Hi @govorunov, I was just taking a look at Hudi myself, so I'm certainly no expert, but I think you are looking for "bitemporal" as-of queries where `commit time` (aka `transaction time`) and `valid time` are indexed and queried independently, e.g. see [this documentation page from XTDB](https://xtdb.com/articles/bitemporality.html). Although - full disclosure - I work on XTDB and can say that XT's current architecture is not designed to handle PBs of data efficiently/cheaply without significant userspace sharding. However, we have a new storage & query architecture in the works that could get us a lot closer to that PB level...but perhaps still not at the same scale that Hudi already operates. From my very brief look at Hudi's design, I would be surprised if there isn't some possible combination to index your own valid time construct using derived tables and query with that efficiently (if inelegantly) via Spark / Presto etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
