qian heng created HUDI-1460:
-------------------------------
Summary: Time Travel (querying the historical versions of data)
ability for Hudi Table
Key: HUDI-1460
URL: https://issues.apache.org/jira/browse/HUDI-1460
Project: Apache Hudi
Issue Type: New Feature
Components: Common Core
Reporter: qian heng
Hi, all:
We plan to use Hudi to sync mysql binlog data. There will be a flink ETL task
to consume binlog records from kafka and save data to hudi every one hour. The
binlog records are also grouped every one hour and all records of one hour will
be saved in one commit. The data transmission pipeline should be like -- binlog
-> kafka -> flink -> parquet.
After the data is synced to hudi, we want to querying the historical hourly
versions of the Hudi table in hive SQL.
Here is a more detailed description of our issue along with a simply design of
Time Travel for Hudi, the design is under development and testingļ¼
[https://docs.google.com/document/d/1r0iwUsklw9aKSDMzZaiq43dy57cSJSAqT9KCvgjbtUo/edit#]
We have to support Time Travel ability recently for our business needs. We also
have seen theĀ [RFC
07|https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table].
Be glad to receive any suggestion or dicussion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)