Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

Taher Koitawala Mon, 16 Sep 2019 09:02:56 -0700

Will this be the same implementation as session.read.jdbc("") and then call
this code continuously like how we are running HUDI in continuous mode?


On Mon, Sep 16, 2019 at 9:09 PM Vinoth Chandar <[email protected]> wrote:

> Thanks, Taher! Any takers for driving this? This is something I would be
> very interested in getting involved with. Dont have the bandwidth atm :/
>
> On Sun, Sep 15, 2019 at 11:15 PM Taher Koitawala <[email protected]>
> wrote:
>
> > Thank you all for your support. JIRA filed at
> > https://issues.apache.org/jira/browse/HUDI-251
> >
> > Regards,
> > Taher Koitawala
> >
> > On Mon, Sep 16, 2019 at 11:34 AM Taher Koitawala <[email protected]>
> > wrote:
> >
> > > Since everyone is fully onboard. I am creating a JIRA to track this.
> > >
> > > On Sun, Sep 15, 2019 at 9:47 AM [email protected] <[email protected]
> >
> > > wrote:
> > >
> > >>
> > >> +1. Agree with everyone's point. Go for it Taher !!
> > >> Balaji.V    On Saturday, September 14, 2019, 07:44:04 PM PDT, Bhavani
> > >> Sudha Saktheeswaran <[email protected]> wrote:
> > >>
> > >>  +1 I  think adding new sources to DeltaStreamer is really valuable.
> > >>
> > >> Thanks,
> > >> Sudha
> > >>
> > >> On Sat, Sep 14, 2019 at 7:52 AM vino yang <[email protected]>
> > wrote:
> > >>
> > >> > Hi Taher,
> > >> >
> > >> > IMO, it's a good supplement to Hudi.
> > >> >
> > >> > So +1 from my side.
> > >> >
> > >> > Vinoth Chandar <[email protected]> 于2019年9月14日周六 下午10:23写道：
> > >> >
> > >> > > Hi Taher,
> > >> > >
> > >> > > I am fully onboard on this. This is such a frequently asked
> question
> > >> and
> > >> > > having it all doable with a simple DeltaStreamer command would be
> > >> really
> > >> > > powerful.
> > >> > >
> > >> > > +1
> > >> > >
> > >> > > - Vinoth
> > >> > >
> > >> > > On 2019/09/14 05:51:05, Taher Koitawala <[email protected]>
> wrote:
> > >> > > > Hi All,
> > >> > > >          Currently, we are trying to pull data incrementally
> from
> > >> our
> > >> > > RDBMS
> > >> > > > sources, however the way we are doing this is with HUDI is to
> > >> create a
> > >> > > > spark table on top of the JDBC source using [1] which writes raw
> > >> data
> > >> > to
> > >> > > an
> > >> > > > HDFS dir. We then use DeltaStreamer dfs-source to write that to
> a
> > >> HUDI
> > >> > > > upsert COPY_ON_WRITE table.
> > >> > > >
> > >> > > >          However, I think it would be really helpful in such use
> > >> cases
> > >> > > > where DeltaStreamer had something like a JDBC-source instead of
> > >> sqoop
> > >> > or
> > >> > > > temp tables and then we could leave that in a continuous mode
> > with a
> > >> > > > timestamp column and an interval which allows us to express how
> > >> > > frequently
> > >> > > > DeltaStreamer should check for new updates or inserts on RDBMS.
> > >> > > >
> > >> > > > 1: CREATE TABLE mysql_temp_table
> > >> > > > USING org.apache.spark.sql.jdbc
> > >> > > > OPTIONS (
> > >> > > >      url  "jdbc:mysql://
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__data.source.mysql.com&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=kd2JZkFO9u_nWk8s__l1rNlfZ0cQ_zXOjURNBNmmJo4&s=zIAG-Ct3xm-8XBHg7Gv4mxPF7YpQJ5wxWTarYnJlJDE&e=
> > >> >
> > >>
> >
> :3306/database?user=mysql_user&password=password&zeroDateTimeBehavior=CONVERT_TO_NULL
> > >> > > > ",
> > >> > > >      dbtable "database.table_name",
> > >> > > >      fetchSize "1000000",
> > >> > > >      partitionColumn "contact_id", lowerBound "1",
> > >> > > > upperBound "2962429",
> > >> > > > numPartitions "62"
> > >> > > > );
> > >> > > >
> > >> > > > Regards,
> > >> > > > Taher Koitawala
> > >> > > >
> > >> > >
> > >> >
> > >
> > >
> >
>

Re: [DISCUSS] [VOTE] JDBC incremental load with DeltaStreamer

Reply via email to