[ 
https://issues.apache.org/jira/browse/HUDI-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324260#comment-17324260
 ] 

Sagar Sumit commented on HUDI-251:
----------------------------------

Hi [~vinoth], I went through the latest PR #969 and the RFC. Building upon the 
work done in #969, we can add the following as suggested:
 # Cache the dataset to avoid reading from the database twice.
 # Limit on how many rows to fetch.
 # A simple sql query builder to support LIMIT clause, AND operator. The latter 
would be useful to support timestamp column apart from the incremental column 
as mentioned in RFC. The query builder can also be enahnced to support other 
features mentioned in RFC in the future. 

> JDBC incremental load to HUDI with DeltaStreamer
> ------------------------------------------------
>
>                 Key: HUDI-251
>                 URL: https://issues.apache.org/jira/browse/HUDI-251
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: DeltaStreamer
>    Affects Versions: 0.9.0
>            Reporter: Taher Koitawala
>            Assignee: Purushotham Pushpavanthar
>            Priority: Trivial
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Mirroring RDBMS to HUDI is one of the most basic use cases of HUDI. Hence, 
> for such use cases, DeltaStreamer should provide inbuilt support.
> DeltaSteamer should accept something like jdbc-source.properties where users 
> can define the RDBMS connection properties along with a timestamp column and 
> an interval which allows users to express how frequently HUDI should check 
> with RDBMS data source for new inserts or updates.
> Details are documented in RFC-14
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to