[
https://issues.apache.org/jira/browse/HUDI-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6041:
---------------------------------
Labels: pull-request-available (was: )
> add `properties` to Hudi Spark Procedures
> -----------------------------------------
>
> Key: HUDI-6041
> URL: https://issues.apache.org/jira/browse/HUDI-6041
> Project: Apache Hudi
> Issue Type: Improvement
> Components: bootstrap, spark-sql
> Reporter: lvyanquan
> Priority: Major
> Labels: pull-request-available
>
> We need to write extra properties to a HDFS file for [Bootstrap
> Procedure|https://hudi.apache.org/docs/next/procedures#run_bootstrap] and set
> `props_file_path`, which make it troublesome to call this procedure, like:
> {code:java}
> call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE',
> bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table',
> base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table',
> rowKey_field => 'id', partition_path_field => 'dt',
> props_file_path => 'hdfs://ns1//tmp/tableProp.txt'); {code}
> Or we can set those properties by session config, which means that we need to
> execute some `set` SQLs.
> We can add a new parameter for procedure input named `properties`, add
> collect key-value pairs for this input, like:
> {code:java}
> call run_bootstrap(table => 'test_hudi_table', table_type => 'COPY_ON_WRITE',
> bootstrap_path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table',
> base_path => 'hdfs://ns1//tmp/hoodie/test_hudi_table',
> rowKey_field => 'id', partition_path_field => 'dt',
> properties => 'hoodie.datasource.write.hive_style_partitioning=true'); {code}
> So that we don't need to put another file to HDFS
--
This message was sent by Atlassian Jira
(v8.20.10#820010)