[GitHub] [hudi] nandurj commented on issue #1586: [SUPPORT] DMS with 2 key example

GitBox Thu, 16 Jul 2020 12:53:25 -0700


nandurj commented on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-659633790



   I am working with HUDI 0.5.2 on EMR 5.30. I am running the job using the 
Delta streamer. Below is how I am running the spark job.
   
   spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  \
     --jars /usr/lib/spark/external/lib/spark-avro_2.11-2.4.5-amzn-0.jar \
     --master yarn --deploy-mode client \
     --executor-memory 10G --executor-cores 4 \
     file:///usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
     --table-type COPY_ON_WRITE \
     --source-ordering-field TIMESTAMP \
     --continuous \
     --enable-hive-sync \
     --min-sync-interval-seconds 60 \
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
     --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer \
     --target-base-path s3://mybucket/CoWex --target-table table_test \
     --payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
     --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
     --hoodie-conf hoodie.datasource.write.recordkey.field="Field1, Field2, 
Field3" \
     --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 \
     --hoodie-conf hoodie.datasource.write.partitionpath.field="Field1" \
     --hoodie-conf hoodie.datasource.hive_sync.database=testdb \
     --hoodie-conf hoodie.datasource.hive_sync.table=test_table\
     --hoodie-conf hoodie.datasource.hive_sync.partition_fields="datefield" \
     --hoodie-conf 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
 \
     --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://mybucket/input


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nandurj commented on issue #1586: [SUPPORT] DMS with 2 key example

Reply via email to