Thank you all for your support. JIRA filed at https://issues.apache.org/jira/browse/HUDI-251
Regards, Taher Koitawala On Mon, Sep 16, 2019 at 11:34 AM Taher Koitawala <[email protected]> wrote: > Since everyone is fully onboard. I am creating a JIRA to track this. > > On Sun, Sep 15, 2019 at 9:47 AM [email protected] <[email protected]> > wrote: > >> >> +1. Agree with everyone's point. Go for it Taher !! >> Balaji.V On Saturday, September 14, 2019, 07:44:04 PM PDT, Bhavani >> Sudha Saktheeswaran <[email protected]> wrote: >> >> +1 I think adding new sources to DeltaStreamer is really valuable. >> >> Thanks, >> Sudha >> >> On Sat, Sep 14, 2019 at 7:52 AM vino yang <[email protected]> wrote: >> >> > Hi Taher, >> > >> > IMO, it's a good supplement to Hudi. >> > >> > So +1 from my side. >> > >> > Vinoth Chandar <[email protected]> 于2019年9月14日周六 下午10:23写道: >> > >> > > Hi Taher, >> > > >> > > I am fully onboard on this. This is such a frequently asked question >> and >> > > having it all doable with a simple DeltaStreamer command would be >> really >> > > powerful. >> > > >> > > +1 >> > > >> > > - Vinoth >> > > >> > > On 2019/09/14 05:51:05, Taher Koitawala <[email protected]> wrote: >> > > > Hi All, >> > > > Currently, we are trying to pull data incrementally from >> our >> > > RDBMS >> > > > sources, however the way we are doing this is with HUDI is to >> create a >> > > > spark table on top of the JDBC source using [1] which writes raw >> data >> > to >> > > an >> > > > HDFS dir. We then use DeltaStreamer dfs-source to write that to a >> HUDI >> > > > upsert COPY_ON_WRITE table. >> > > > >> > > > However, I think it would be really helpful in such use >> cases >> > > > where DeltaStreamer had something like a JDBC-source instead of >> sqoop >> > or >> > > > temp tables and then we could leave that in a continuous mode with a >> > > > timestamp column and an interval which allows us to express how >> > > frequently >> > > > DeltaStreamer should check for new updates or inserts on RDBMS. >> > > > >> > > > 1: CREATE TABLE mysql_temp_table >> > > > USING org.apache.spark.sql.jdbc >> > > > OPTIONS ( >> > > > url "jdbc:mysql:// >> > > > >> > > >> > >> https://urldefense.proofpoint.com/v2/url?u=http-3A__data.source.mysql.com&d=DwIFaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=oyPDRKU5b-LuEWWyf8gacx4mFFydIGdyS50OKdxizX0&m=kd2JZkFO9u_nWk8s__l1rNlfZ0cQ_zXOjURNBNmmJo4&s=zIAG-Ct3xm-8XBHg7Gv4mxPF7YpQJ5wxWTarYnJlJDE&e= >> > >> :3306/database?user=mysql_user&password=password&zeroDateTimeBehavior=CONVERT_TO_NULL >> > > > ", >> > > > dbtable "database.table_name", >> > > > fetchSize "1000000", >> > > > partitionColumn "contact_id", lowerBound "1", >> > > > upperBound "2962429", >> > > > numPartitions "62" >> > > > ); >> > > > >> > > > Regards, >> > > > Taher Koitawala >> > > > >> > > >> > > >
