Ethan Guo created HUDI-4001:
-------------------------------
Summary: "hoodie.datasource.write.operation" from table config
should not be used as write operation
Key: HUDI-4001
URL: https://issues.apache.org/jira/browse/HUDI-4001
Project: Apache Hudi
Issue Type: Task
Components: spark-sql
Reporter: Ethan Guo
[https://github.com/apache/hudi/issues/5248]
when I use spark sql create table and set
{*}hoodie.datasource.write.operation{*}=upsert.
delete sql (like pr [#5215|https://github.com/apache/hudi/pull/5215] ), insert
overwrite sql etc will still use *hoodie.datasource.write.operation* to update
record, not delete, insert_overwrite etc.
eg:
create a table and set hoodie.datasource.write.operation upsert
when I use sql to delete, the delete operation key will be overwrite by
hoodie.datasource.write.operation from table or env, *OPERATION.key ->
DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL* will not effect, overwrite to
*upsert*
withSparkConf(sparkSession, hoodieCatalogTable.catalogProperties) { Map( "path"
-> path, RECORDKEY_FIELD.key -> hoodieCatalogTable.primaryKeys.mkString(","),
TBL_NAME.key -> tableConfig.getTableName, HIVE_STYLE_PARTITIONING.key ->
tableConfig.getHiveStylePartitioningEnable, URL_ENCODE_PARTITIONING.key ->
tableConfig.getUrlEncodePartitioning, KEYGENERATOR_CLASS_NAME.key ->
classOf[SqlKeyGenerator].getCanonicalName,
SqlKeyGenerator.ORIGIN_KEYGEN_CLASS_NAME ->
tableConfig.getKeyGeneratorClassName, OPERATION.key ->
DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL, PARTITIONPATH_FIELD.key ->
tableConfig.getPartitionFieldProp, HiveSyncConfig.HIVE_SYNC_MODE.key ->
HiveSyncMode.HMS.name(), HiveSyncConfig.HIVE_SUPPORT_TIMESTAMP_TYPE.key ->
"true", HoodieWriteConfig.DELETE_PARALLELISM_VALUE.key -> "200",
SqlKeyGenerator.PARTITION_SCHEMA -> partitionSchema.toDDL ) }
so, when use sql, what about don't write it to hoodie.properties, confine it
when sql check, command generated itself in runtime.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)