Hi Taher, IMO, it's a good supplement to Hudi.
So +1 from my side. Vinoth Chandar <[email protected]> 于2019年9月14日周六 下午10:23写道: > Hi Taher, > > I am fully onboard on this. This is such a frequently asked question and > having it all doable with a simple DeltaStreamer command would be really > powerful. > > +1 > > - Vinoth > > On 2019/09/14 05:51:05, Taher Koitawala <[email protected]> wrote: > > Hi All, > > Currently, we are trying to pull data incrementally from our > RDBMS > > sources, however the way we are doing this is with HUDI is to create a > > spark table on top of the JDBC source using [1] which writes raw data to > an > > HDFS dir. We then use DeltaStreamer dfs-source to write that to a HUDI > > upsert COPY_ON_WRITE table. > > > > However, I think it would be really helpful in such use cases > > where DeltaStreamer had something like a JDBC-source instead of sqoop or > > temp tables and then we could leave that in a continuous mode with a > > timestamp column and an interval which allows us to express how > frequently > > DeltaStreamer should check for new updates or inserts on RDBMS. > > > > 1: CREATE TABLE mysql_temp_table > > USING org.apache.spark.sql.jdbc > > OPTIONS ( > > url "jdbc:mysql:// > > > data.source.mysql.com:3306/database?user=mysql_user&password=password&zeroDateTimeBehavior=CONVERT_TO_NULL > > ", > > dbtable "database.table_name", > > fetchSize "1000000", > > partitionColumn "contact_id", lowerBound "1", > > upperBound "2962429", > > numPartitions "62" > > ); > > > > Regards, > > Taher Koitawala > > >
