>>Lets say if I have a source table in Oracle in the format below, will my
avro schema for source and target will be same.

yes. if you do any transformations in between, then DeltaStreamer can make
the target schema automatically.

In the upcoming 0.5.2 release, we have also have
org.apache.hudi.utilities.schema.JdbcbasedSchemaProvider which should be
able to generate the source avro schema from the table metadata
automatically.
https://github.com/apache/incubator-hudi/pull/1200

>>Our plan is to use AWS DMS for initial load & CDC.
For DMS, you get Parquet source files, which are self describing..
DeltaStreamer does not interact with Oracle directly. DMS handles the
mapping of the Oracle table schema to Parquet schema.. Its much simpler .

On Wed, Mar 18, 2020 at 10:14 AM Shiyan Xu <[email protected]>
wrote:

> To answer your question regarding the properties file
> It is a way to manage a bunch of hoodie configuration; those confs will be
> merged with other confs passed from --hoodie-conf. See this line
> <
> https://github.com/apache/incubator-hudi/blob/779edc068865898049569da0fe750574f93a0dca/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L362
> >.
> So any hoodie conf can be put there. Usually we put "configurations for
> hoodie client, schema provider, key generator and data source" (per the
> docs).
>
> On Wed, Mar 18, 2020 at 6:50 AM Syed Zaidi <[email protected]>
> wrote:
>
> > Hi,
> >
> > I hope things are good. We are planning on using DetalStreamer as a
> client
> > for hudi. Our plan is to use AWS DMS for initial load & CDC. The
> question I
> > have is around the documentation for the properties file that I need for
> > dfs, source & target. Where can I find more information on the properties
> > files need for the client.
> >
> > Lets say if I have a source table in Oracle in the format below, will my
> > avro schema for source and target will be same.
> >
> > CREATE TABLE orders
> >   (​
> >     order_id NUMBER GENERATED BY DEFAULT AS IDENTITY START WITH 106
> > PRIMARY KEY,​
> >     customer_id NUMBER( 6, 0 ) NOT NULL, ​
> >     status      VARCHAR( 20 ) NOT NULL ,​
> >     salesman_id NUMBER( 6, 0 )         , ​
> >     order_date   TIMESTAMP NOT NULL    ​
> >   );
> >
> > I would appreciate your help in this regard.
> >
> > We are on this stack:
> >
> > EMR : emr-5.29.0
> > Spark: Spark 2.4.4, spark-avro_2.11:2.4.4
> >
> > Thanks
> > Syed Zaidi
> >
>

Reply via email to