[GitHub] [hudi] nsivabalan commented on a diff in pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO

via GitHub Mon, 11 Sep 2023 18:49:03 -0700


nsivabalan commented on code in PR #9665:
URL: https://github.com/apache/hudi/pull/9665#discussion_r1322266556



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -193,7 +196,7 @@ trait ProvidesHoodieConfig extends Logging {
       DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
     val dropDuplicate = sparkSession.conf
       
.getOption(INSERT_DROP_DUPS.key).getOrElse(INSERT_DROP_DUPS.defaultValue).toBoolean
-    val autoGenerateRecordKeys : Boolean = 
!combinedOpts.contains(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key());
+    val shouldAutoKeyGen: Boolean = shouldAutoGenerateRecordKeys(combinedOpts)

Review Comment:
   thnx for this fix



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -208,19 +211,22 @@ trait ProvidesHoodieConfig extends Logging {
     val isPartitionedTable = hoodieCatalogTable.partitionFields.nonEmpty
     val combineBeforeInsert = hoodieCatalogTable.preCombineKey.nonEmpty && 
hoodieCatalogTable.primaryKeys.nonEmpty
 
-    // try to use sql write operation instead of legacy insert mode. If only 
insert mode is explicitly specified, w/o specifying
-    // any value for sql write operation, leagcy configs will be honored. But 
on all other cases (i.e when neither of the configs is set,
-    // or when both configs are set, or when only sql write operation is set), 
we honor sql write operation and ignore
-    // the insert mode.
+    /*
+     * The sql write operation has higher precedence than the legacy insert 
mode.
+     * Only when the legacy insert mode is explicitly set, without setting sql 
write operation,
+     * legacy configs will be honored. On all other cases (i.e when both are 
set, either is set,
+     * or when only the sql write operation is set), we honor the sql write 
operation.
+     */
     val useLegacyInsertModeFlow = insertModeSet && !sqlWriteOperationSet
     var operation = combinedOpts.getOrElse(OPERATION.key,
       if (useLegacyInsertModeFlow) {
         // NOTE: Target operation could be overridden by the user, therefore 
if it has been provided as an input
         //       we'd prefer that value over auto-deduced operation. 
Otherwise, we deduce target operation type
         deduceOperation(enableBulkInsert, isOverwritePartition, 
isOverwriteTable, dropDuplicate,
-          isNonStrictMode, isPartitionedTable, combineBeforeInsert, 
insertMode, autoGenerateRecordKeys)
+          isNonStrictMode, isPartitionedTable, combineBeforeInsert, 
insertMode, shouldAutoKeyGen)

Review Comment:
   we should try to align variable names to config keys. 
   we should fix the naming of "sqlWriteOperation" -> 
"sparkSqlInsertIntoOperation".
   if we are going to push any additional fixes to this patch, lets do that. 



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala:
##########
@@ -1727,25 +1727,26 @@ class TestInsertTable extends HoodieSparkSqlTestBase {
   }
 
   /**
-   * When neither of strict mode nor sql.write.operation is set, sql write 
operation takes precedence and default value is chosen.
+   * When neither of strict mode nor sql.write.operation is set, sql write 
operation is deduced as UPSERT
+   * due to presence of preCombineField.

Review Comment:
   do we have a similar test where preCombine is not set. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #9665: [HUDI-6478] Deduce op as upsert for INSERT INTO

Reply via email to