Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

Vinoth Chandar Mon, 16 Sep 2019 08:39:20 -0700

Thanks, Taher! Any takers for driving this? This is something I would be
very interested in getting involved with. Dont have the bandwidth atm :/


On Sun, Sep 15, 2019 at 11:15 PM Taher Koitawala <[email protected]> wrote:

> Thank you all for your support. JIRA filed at
> https://issues.apache.org/jira/browse/HUDI-251
>
> Regards,
> Taher Koitawala
>
> On Mon, Sep 16, 2019 at 11:34 AM Taher Koitawala <[email protected]>
> wrote:
>
> > Since everyone is fully onboard. I am creating a JIRA to track this.
> >
> > On Sun, Sep 15, 2019 at 9:47 AM [email protected] <[email protected]>
> > wrote:
> >
> >>
> >> +1. Agree with everyone's point. Go for it Taher !!
> >> Balaji.V    On Saturday, September 14, 2019, 07:44:04 PM PDT, Bhavani
> >> Sudha Saktheeswaran <[email protected]> wrote:
> >>
> >>  +1 I  think adding new sources to DeltaStreamer is really valuable.
> >>
> >> Thanks,
> >> Sudha
> >>
> >> On Sat, Sep 14, 2019 at 7:52 AM vino yang <[email protected]>
> wrote:
> >>
> >> > Hi Taher,
> >> >
> >> > IMO, it's a good supplement to Hudi.
> >> >
> >> > So +1 from my side.
> >> >
> >> > Vinoth Chandar <[email protected]> 于2019年9月14日周六 下午10:23写道：
> >> >
> >> > > Hi Taher,
> >> > >
> >> > > I am fully onboard on this. This is such a frequently asked question
> >> and
> >> > > having it all doable with a simple DeltaStreamer command would be
> >> really
> >> > > powerful.
> >> > >
> >> > > +1
> >> > >
> >> > > - Vinoth
> >> > >
> >> > > On 2019/09/14 05:51:05, Taher Koitawala <[email protected]> wrote:
> >> > > > Hi All,
> >> > > >          Currently, we are trying to pull data incrementally from
> >> our
> >> > > RDBMS
> >> > > > sources, however the way we are doing this is with HUDI is to
> >> create a
> >> > > > spark table on top of the JDBC source using [1] which writes raw
> >> data
> >> > to
> >> > > an
> >> > > > HDFS dir. We then use DeltaStreamer dfs-source to write that to a
> >> HUDI
> >> > > > upsert COPY_ON_WRITE table.
> >> > > >
> >> > > >          However, I think it would be really helpful in such use
> >> cases
> >> > > > where DeltaStreamer had something like a JDBC-source instead of
> >> sqoop
> >> > or
> >> > > > temp tables and then we could leave that in a continuous mode
> with a
> >> > > > timestamp column and an interval which allows us to express how
> >> > > frequently
> >> > > > DeltaStreamer should check for new updates or inserts on RDBMS.
> >> > > >
> >> > > > 1: CREATE TABLE mysql_temp_table
> >> > > > USING org.apache.spark.sql.jdbc
> >> > > > OPTIONS (
> >> > > >      url  "jdbc:mysql://
> >> > > >
> >> > >
> >> >
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__data.source.mysql.com&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=kd2JZkFO9u_nWk8s__l1rNlfZ0cQ_zXOjURNBNmmJo4&s=zIAG-Ct3xm-8XBHg7Gv4mxPF7YpQJ5wxWTarYnJlJDE&e=
> >> >
> >>
> :3306/database?user=mysql_user&password=password&zeroDateTimeBehavior=CONVERT_TO_NULL
> >> > > > ",
> >> > > >      dbtable "database.table_name",
> >> > > >      fetchSize "1000000",
> >> > > >      partitionColumn "contact_id", lowerBound "1",
> >> > > > upperBound "2962429",
> >> > > > numPartitions "62"
> >> > > > );
> >> > > >
> >> > > > Regards,
> >> > > > Taher Koitawala
> >> > > >
> >> > >
> >> >
> >
> >
>

Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

Reply via email to