[ 
https://issues.apache.org/jira/browse/HUDI-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324708#comment-17324708
 ] 

Vinoth Chandar commented on HUDI-251:
-------------------------------------

Hi [~codope] 1 also helps us determine the checkpoint value to use for next 
time? 

On 2, I guess the idea is to translate this to a LIMIT on the source sql? if 
so, do we need some sorting by the checkpoint column?  otherwise we ll 
potentially miss some records due to incorrect checkpoint value?

 

on 3, with you. if we can support a simple tailing mechanism for now, it ll 
suffice. (JDBC based pulling has its own corner cases; which we can call out)

> JDBC incremental load to HUDI with DeltaStreamer
> ------------------------------------------------
>
>                 Key: HUDI-251
>                 URL: https://issues.apache.org/jira/browse/HUDI-251
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: DeltaStreamer
>    Affects Versions: 0.9.0
>            Reporter: Taher Koitawala
>            Assignee: Purushotham Pushpavanthar
>            Priority: Trivial
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Mirroring RDBMS to HUDI is one of the most basic use cases of HUDI. Hence, 
> for such use cases, DeltaStreamer should provide inbuilt support.
> DeltaSteamer should accept something like jdbc-source.properties where users 
> can define the RDBMS connection properties along with a timestamp column and 
> an interval which allows users to express how frequently HUDI should check 
> with RDBMS data source for new inserts or updates.
> Details are documented in RFC-14
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to