nandurj edited a comment on issue #1586:
URL: https://github.com/apache/hudi/issues/1586#issuecomment-659633790


   I am working with HUDI 0.5.2 on EMR 5.30. I am running the job using the 
Delta streamer. Below is how I am running the spark job.
   
   spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  \
     --jars /usr/lib/spark/external/lib/spark-avro_2.11-2.4.5-amzn-0.jar \
     --master yarn --deploy-mode client \
     --executor-memory 10G --executor-cores 4 \
     file:///usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
     --table-type COPY_ON_WRITE \
     --source-ordering-field TIMESTAMP \
     --continuous \
     --enable-hive-sync \
     --min-sync-interval-seconds 60 \
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
     --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer \
     --target-base-path s3://mybucket/CoWex --target-table table_test \
     --payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
     --hoodie-conf hoodie.datasource.write.recordkey.field="Field1, Field2, 
Field3" \
     --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 \
     --hoodie-conf hoodie.datasource.write.partitionpath.field="Field1" \
     --hoodie-conf hoodie.datasource.hive_sync.database=testdb \
     --hoodie-conf hoodie.datasource.hive_sync.table=test_table\
     --hoodie-conf hoodie.datasource.hive_sync.partition_fields="datefield" \
     --hoodie-conf 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
 \
     --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://mybucket/input
   
   
   Spark-shell output:
   scala> spark.sql("""select _hoodie_record_key from 
testdb.test_table""").show(false)
   +--------------------------------------------------------------------+       
   
   |_hoodie_record_key                                                  |
   +--------------------------------------------------------------------+
   |Field1:[0, 0]|
   +--------------------------------------------------------------------+
   
   From the above output Field2, Field3 are missing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to