>>Lets say if I have a source table in Oracle in the format below, will my avro schema for source and target will be same.
yes. if you do any transformations in between, then DeltaStreamer can make the target schema automatically. In the upcoming 0.5.2 release, we have also have org.apache.hudi.utilities.schema.JdbcbasedSchemaProvider which should be able to generate the source avro schema from the table metadata automatically. https://github.com/apache/incubator-hudi/pull/1200 >>Our plan is to use AWS DMS for initial load & CDC. For DMS, you get Parquet source files, which are self describing.. DeltaStreamer does not interact with Oracle directly. DMS handles the mapping of the Oracle table schema to Parquet schema.. Its much simpler . On Wed, Mar 18, 2020 at 10:14 AM Shiyan Xu <[email protected]> wrote: > To answer your question regarding the properties file > It is a way to manage a bunch of hoodie configuration; those confs will be > merged with other confs passed from --hoodie-conf. See this line > < > https://github.com/apache/incubator-hudi/blob/779edc068865898049569da0fe750574f93a0dca/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L362 > >. > So any hoodie conf can be put there. Usually we put "configurations for > hoodie client, schema provider, key generator and data source" (per the > docs). > > On Wed, Mar 18, 2020 at 6:50 AM Syed Zaidi <[email protected]> > wrote: > > > Hi, > > > > I hope things are good. We are planning on using DetalStreamer as a > client > > for hudi. Our plan is to use AWS DMS for initial load & CDC. The > question I > > have is around the documentation for the properties file that I need for > > dfs, source & target. Where can I find more information on the properties > > files need for the client. > > > > Lets say if I have a source table in Oracle in the format below, will my > > avro schema for source and target will be same. > > > > CREATE TABLE orders > > ( > > order_id NUMBER GENERATED BY DEFAULT AS IDENTITY START WITH 106 > > PRIMARY KEY, > > customer_id NUMBER( 6, 0 ) NOT NULL, > > status VARCHAR( 20 ) NOT NULL , > > salesman_id NUMBER( 6, 0 ) , > > order_date TIMESTAMP NOT NULL > > ); > > > > I would appreciate your help in this regard. > > > > We are on this stack: > > > > EMR : emr-5.29.0 > > Spark: Spark 2.4.4, spark-avro_2.11:2.4.4 > > > > Thanks > > Syed Zaidi > > >
