nsivabalan commented on code in PR #9665:
URL: https://github.com/apache/hudi/pull/9665#discussion_r1322266556
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -193,7 +196,7 @@ trait ProvidesHoodieConfig extends Logging {
DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
val dropDuplicate = sparkSession.conf
.getOption(INSERT_DROP_DUPS.key).getOrElse(INSERT_DROP_DUPS.defaultValue).toBoolean
- val autoGenerateRecordKeys : Boolean =
!combinedOpts.contains(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key());
+ val shouldAutoKeyGen: Boolean = shouldAutoGenerateRecordKeys(combinedOpts)
Review Comment:
thnx for this fix
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -208,19 +211,22 @@ trait ProvidesHoodieConfig extends Logging {
val isPartitionedTable = hoodieCatalogTable.partitionFields.nonEmpty
val combineBeforeInsert = hoodieCatalogTable.preCombineKey.nonEmpty &&
hoodieCatalogTable.primaryKeys.nonEmpty
- // try to use sql write operation instead of legacy insert mode. If only
insert mode is explicitly specified, w/o specifying
- // any value for sql write operation, leagcy configs will be honored. But
on all other cases (i.e when neither of the configs is set,
- // or when both configs are set, or when only sql write operation is set),
we honor sql write operation and ignore
- // the insert mode.
+ /*
+ * The sql write operation has higher precedence than the legacy insert
mode.
+ * Only when the legacy insert mode is explicitly set, without setting sql
write operation,
+ * legacy configs will be honored. On all other cases (i.e when both are
set, either is set,
+ * or when only the sql write operation is set), we honor the sql write
operation.
+ */
val useLegacyInsertModeFlow = insertModeSet && !sqlWriteOperationSet
var operation = combinedOpts.getOrElse(OPERATION.key,
if (useLegacyInsertModeFlow) {
// NOTE: Target operation could be overridden by the user, therefore
if it has been provided as an input
// we'd prefer that value over auto-deduced operation.
Otherwise, we deduce target operation type
deduceOperation(enableBulkInsert, isOverwritePartition,
isOverwriteTable, dropDuplicate,
- isNonStrictMode, isPartitionedTable, combineBeforeInsert,
insertMode, autoGenerateRecordKeys)
+ isNonStrictMode, isPartitionedTable, combineBeforeInsert,
insertMode, shouldAutoKeyGen)
Review Comment:
we should try to align variable names to config keys.
we should fix the naming of "sqlWriteOperation" ->
"sparkSqlInsertIntoOperation".
if we are going to push any additional fixes to this patch, lets do that.
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala:
##########
@@ -1727,25 +1727,26 @@ class TestInsertTable extends HoodieSparkSqlTestBase {
}
/**
- * When neither of strict mode nor sql.write.operation is set, sql write
operation takes precedence and default value is chosen.
+ * When neither of strict mode nor sql.write.operation is set, sql write
operation is deduced as UPSERT
+ * due to presence of preCombineField.
Review Comment:
do we have a similar test where preCombine is not set.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]