Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

vino yang Sat, 14 Sep 2019 07:52:50 -0700

Hi Taher,

IMO, it's a good supplement to Hudi.


So +1 from my side.

Vinoth Chandar <[email protected]> 于2019年9月14日周六 下午10:23写道：

> Hi Taher,
>
> I am fully onboard on this. This is such a frequently asked question and
> having it all doable with a simple DeltaStreamer command would be really
> powerful.
>
> +1
>
> - Vinoth
>
> On 2019/09/14 05:51:05, Taher Koitawala <[email protected]> wrote:
> > Hi All,
> >          Currently, we are trying to pull data incrementally from our
> RDBMS
> > sources, however the way we are doing this is with HUDI is to create a
> > spark table on top of the JDBC source using [1] which writes raw data to
> an
> > HDFS dir. We then use DeltaStreamer dfs-source to write that to a HUDI
> > upsert COPY_ON_WRITE table.
> >
> >           However, I think it would be really helpful in such use cases
> > where DeltaStreamer had something like a JDBC-source instead of sqoop or
> > temp tables and then we could leave that in a continuous mode with a
> > timestamp column and an interval which allows us to express how
> frequently
> > DeltaStreamer should check for new updates or inserts on RDBMS.
> >
> > 1: CREATE TABLE mysql_temp_table
> > USING org.apache.spark.sql.jdbc
> > OPTIONS (
> >      url  "jdbc:mysql://
> >
> data.source.mysql.com:3306/database?user=mysql_user&password=password&zeroDateTimeBehavior=CONVERT_TO_NULL
> > ",
> >      dbtable "database.table_name",
> >      fetchSize "1000000",
> >      partitionColumn "contact_id", lowerBound "1",
> > upperBound "2962429",
> > numPartitions "62"
> > );
> >
> > Regards,
> > Taher Koitawala
> >
>

Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

Reply via email to