[jira] [Commented] (KUDU-2673) Event timestamp support with kudu.

Adar Dembo (JIRA) Tue, 16 Jul 2019 17:12:12 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886568#comment-16886568
 ]


Adar Dembo commented on KUDU-2673:
----------------------------------

Have you considered using a bi-temporal model for your schema? The idea is to 
include two timestamp columns in the schema: one that tracks when an event 
started being valid, and another that tracks when the event became invalid. 
Your table then looks like a standard fact table in that the vast majority of 
operations are inserts; updates are only used to correct mistakes made at 
insert-time. And you use special predicates to select the set of valid events 
at a given point in time.

Cloudera's blog has a post covering [bi-temporal data 
modeling|https://blog.cloudera.com/blog/2017/05/bi-temporal-data-modeling-with-envelope/];
 maybe you'll find it useful.


> Event timestamp support with kudu.
> ----------------------------------
>
>                 Key: KUDU-2673
>                 URL: https://issues.apache.org/jira/browse/KUDU-2673
>             Project: Kudu
>          Issue Type: New Feature
>          Components: java, spark, tserver
>            Reporter: yangz
>            Priority: Major
>              Labels: features, roadmap-candidate
>
> Kudu has the ability to read historical data. But it is based by the 
> timestamp produced by kudu transaction and mvcc system. The timestamp kudu 
> used greatly weakened the usability.
> For our use case. we write data to kudu from data stream. We use range 
> partition by day.
> We want to get the hour version from kudu. So we need read history data from 
> kudu.
> It produced by undo file. But when user give a timestamp, it means timestamp 
> the event happen, associated with the data. Not the timestamp kudu produced. 
> So we need a way to set event timestamp to the kudu system.
> Finally, we got a way to solve this problem.
> But our solution has two limit.
>  # We only update the table by a row, and for one row we have a timestamp 
> with it.
>  # For getting the right history version of data, we need the data stream 
> send data by event time order.
> Despite these problems, it has satisfied our current business.
>  
> And our implement also solve part problem for the wrong order problem of 
> event time if you only need the newest data, which will not read undo file.
> for the data send into kudu,       t1 < t2
> t1 upsert -> t2 upsert      ->    newest will be t2 value
> t2 upsert -> t1 upsret      ->    (current kudu implement) t1,  our implement 
> will be t2.
>  
> Maybe our solution is not the best for the problem. But I think kudu snapshot 
> read should support event time.
> Our solution is not so complete for all user cases. But I hope it will be 
> useful for some cases with the community.   
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (KUDU-2673) Event timestamp support with kudu.

Reply via email to