[
https://issues.apache.org/jira/browse/KUDU-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886568#comment-16886568
]
Adar Dembo commented on KUDU-2673:
----------------------------------
Have you considered using a bi-temporal model for your schema? The idea is to
include two timestamp columns in the schema: one that tracks when an event
started being valid, and another that tracks when the event became invalid.
Your table then looks like a standard fact table in that the vast majority of
operations are inserts; updates are only used to correct mistakes made at
insert-time. And you use special predicates to select the set of valid events
at a given point in time.
Cloudera's blog has a post covering [bi-temporal data
modeling|https://blog.cloudera.com/blog/2017/05/bi-temporal-data-modeling-with-envelope/];
maybe you'll find it useful.
> Event timestamp support with kudu.
> ----------------------------------
>
> Key: KUDU-2673
> URL: https://issues.apache.org/jira/browse/KUDU-2673
> Project: Kudu
> Issue Type: New Feature
> Components: java, spark, tserver
> Reporter: yangz
> Priority: Major
> Labels: features, roadmap-candidate
>
> Kudu has the ability to read historical data. But it is based by the
> timestamp produced by kudu transaction and mvcc system. The timestamp kudu
> used greatly weakened the usability.
> For our use case. we write data to kudu from data stream. We use range
> partition by day.
> We want to get the hour version from kudu. So we need read history data from
> kudu.
> It produced by undo file. But when user give a timestamp, it means timestamp
> the event happen, associated with the data. Not the timestamp kudu produced.
> So we need a way to set event timestamp to the kudu system.
> Finally, we got a way to solve this problem.
> But our solution has two limit.
> # We only update the table by a row, and for one row we have a timestamp
> with it.
> # For getting the right history version of data, we need the data stream
> send data by event time order.
> Despite these problems, it has satisfied our current business.
>
> And our implement also solve part problem for the wrong order problem of
> event time if you only need the newest data, which will not read undo file.
> for the data send into kudu, t1 < t2
> t1 upsert -> t2 upsert -> newest will be t2 value
> t2 upsert -> t1 upsret -> (current kudu implement) t1, our implement
> will be t2.
>
> Maybe our solution is not the best for the problem. But I think kudu snapshot
> read should support event time.
> Our solution is not so complete for all user cases. But I hope it will be
> useful for some cases with the community.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)