It depends on which mechanism you use :
1. For Spark DataSource route, you can use the "options" API of DataFrameWriter
to pass in these configs. Here is an example from
http://hudi.apache.org/incremental_processing.html
inputDF.write()
.format("com.uber.hoodie")
.options(clientOpts) // any of the Hudi client opts can be passed in as
well
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "_row_key")
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(),
"partition")
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "timestamp")
.option(HoodieWriteConfig.TABLE_NAME, tableName)
.mode(SaveMode.Append)
.save(basePath);
2. For an approach involving using HoodieWriteClient directly, you can simply
construct HoodieWriteConfig object with the configs in the link you mentioned.
3. When using HoodieDeltaStreamer tool to ingest, you can set the configs in
properties file and pass the file as the cmdline argument "--props"
All the file-size configs must be in bytes denomination
Balaji.V
On Thursday, March 7, 2019, 7:00:55 AM PST, [email protected]
<[email protected]> wrote:
Hi All
I have found hoodie related configurations in
http://hudi.apache.org/configurations.html. Please tell how we can pass these
configurations to the spark job. Also please tell for file size related
configs in which way i need to give the value for MB/GB/Bytes.
Thanks & Regards
Rahul P