harishraju-govindaraju commented on issue #4641:
URL: https://github.com/apache/hudi/issues/4641#issuecomment-1017122913


   Hello @nsivabalan ,
   
   Thanks for promptly responding to my question. 
   
   I tried to clear the folder and reran the below spark-submit command. The 
folder .hoodie got created but the job ended with error with no data files. 
   
    Unrecognized token 'Objavro': was expecting (JSON String, Number, Array, 
Object or token 'null', 'true' or 'false')
    at [Source: 
(String)"Objavro.schema�{"type":"record","name":"topLevelRecord","fields":[{"name":"id","type":["string","null"]},{"name":"creation_date","type":["string","null"]},{"name":"last_update_time","type":["string","null"]},{"name":"quantity","type":["string","null"]},{"name":"compcode","type":["string","null"]}]}0org.apache.spark.version";
 line: 1, column: 11]
   
   spark-submit \
   --jars "s3://zcustomjar/spark-avro_2.11-2.4.4.jar" \
   --deploy-mode "client" \
   --class "org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer"  
/usr/lib/hudi/hudi-utilities-bundle.jar \
   --schemaprovider-class 
"org.apache.hudi.utilities.schema.FilebasedSchemaProvider" \
   --table-type COPY_ON_WRITE \
   --source-ordering-field id \
   --target-base-path s3://ztrusted1/default/hudi-table1/ --target-table 
hudi-table1 \
   --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
 \
   --hoodie-conf hoodie.datasource.write.recordkey.field=id \
   --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://zlanding1/input1/ \
   --hoodie-conf hoodie.datasource.write.partitionpath.field=compcode \
   --hoodie-conf hoodie.datasource.write.operation=insert \
   --hoodie-conf 
hoodie.deltastreamer.schemaprovider.source.schema.file=s3://zcustomjar/source2.avsc
 \
   --hoodie-conf 
hoodie.deltastreamer.schemaprovider.target.schema.file=s3://zcustomjar/target.avsc
 \
   
   
   I have manually created the schema .avsc file using notepad. Not sure if 
that is a problem. 
   
   {
     "type" : "record",
     "name" : "triprec",
     "fields" : [
     {
       "name" : "id",
       "type" : "string"
     }, {
       "name" : "creation_date",
       "type" : "string"
     }, {
       "name" : "last_update_time",
       "type" : "string"
     }, {
       "name" : "quantity",
       "type" : "string"
     }, {
       "name" : "compcode",
       "type" : "string"
     }]
   }
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to