[jira] [Commented] (HUDI-251) JDBC incremental load to HUDI with DeltaStreamer

Sagar Sumit (Jira) Mon, 19 Apr 2021 09:10:07 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325156#comment-17325156
 ]


Sagar Sumit commented on HUDI-251:
----------------------------------

# Yes. The sequence is: fetch() -> persist() -> checkpoint() -> form the pair 
with persisted dataset and checkpoint -> unpersist() -> return pair.
 # That's a valid point. What if the number of records since the last 
checkpoint is greater than sourceLimit? Even if we order by the checkpoint 
column, we will miss some records. That means we need some sort of pagination 
on top of sorting (doing multiple select from..where ckpt > last_ckpt order by 
ckpt desc limit x). Won't this be costlier than single select query without 
limit?
 # Can you please elaborate more on the tailing mechanism? Is it something 
related to pagination point I mentioned above?

> JDBC incremental load to HUDI with DeltaStreamer
> ------------------------------------------------
>
>                 Key: HUDI-251
>                 URL: https://issues.apache.org/jira/browse/HUDI-251
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: DeltaStreamer
>    Affects Versions: 0.9.0
>            Reporter: Taher Koitawala
>            Assignee: Purushotham Pushpavanthar
>            Priority: Trivial
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Mirroring RDBMS to HUDI is one of the most basic use cases of HUDI. Hence, 
> for such use cases, DeltaStreamer should provide inbuilt support.
> DeltaSteamer should accept something like jdbc-source.properties where users 
> can define the RDBMS connection properties along with a timestamp column and 
> an interval which allows users to express how frequently HUDI should check 
> with RDBMS data source for new inserts or updates.
> Details are documented in RFC-14
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-251) JDBC incremental load to HUDI with DeltaStreamer

Reply via email to