Re: How i can pass Hoodie configurations to spark job

Balaji Varadarajan Thu, 07 Mar 2019 09:02:52 -0800

 It depends on which mechanism you use :
1. For Spark DataSource route, you can use the "options" API of DataFrameWriter 
to pass in these configs. Here is an example from 
http://hudi.apache.org/incremental_processing.html
inputDF.write()
       .format("com.uber.hoodie")
       .options(clientOpts) // any of the Hudi client opts can be passed in as 
well
       .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
       .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
"partition")
       .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
       .option(HoodieWriteConfig.TABLE_NAME, tableName)
       .mode(SaveMode.Append)
       .save(basePath);
2. For an approach involving using HoodieWriteClient directly, you can simply 
construct HoodieWriteConfig object with the configs in the link you mentioned.
3. When using HoodieDeltaStreamer tool to ingest, you can set the configs in 
properties file and pass the file as the cmdline argument "--props"


All the file-size configs must be in bytes denomination
Balaji.V


    On Thursday, March 7, 2019, 7:00:55 AM PST, [email protected] 
<[email protected]> wrote:  
 
 Hi All
I have found hoodie related configurations in 
http://hudi.apache.org/configurations.html.  Please tell how we can pass these 
configurations to the spark job.  Also please tell for file size related 
configs in which way i need to give the value for MB/GB/Bytes.

Thanks & Regards
Rahul P

Re: How i can pass Hoodie configurations to spark job

Reply via email to