[GitHub] [hudi] jonvex commented on a diff in pull request #8697: [HUDI-5514] Improving usability/performance with out of box default for append only use-cases

via GitHub Wed, 21 Jun 2023 06:21:11 -0700


jonvex commented on code in PR #8697:
URL: https://github.com/apache/hudi/pull/8697#discussion_r1236994454



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -142,24 +144,27 @@ trait ProvidesHoodieConfig extends Logging {
     //       we'd prefer that value over auto-deduced operation. Otherwise, we 
deduce target operation type
     val operationOverride = 
combinedOpts.get(DataSourceWriteOptions.OPERATION.key)
     val operation = operationOverride.getOrElse {
-      (enableBulkInsert, isOverwritePartition, isOverwriteTable, 
dropDuplicate, isNonStrictMode, isPartitionedTable) match {
-        case (true, _, _, _, false, _) =>
+      (enableBulkInsert, isOverwritePartition, isOverwriteTable, 
dropDuplicate, isNonStrictMode, isPartitionedTable,
+      autoGenerateRecordKeys) match {
+        case (true, _, _, _, false, _, _) =>
           throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert in ${insertMode.value()} mode.")
-        case (true, true, _, _, _, true) =>
+        case (true, true, _, _, _, true, _) =>
           throw new IllegalArgumentException(s"Insert Overwrite Partition can 
not use bulk insert.")
-        case (true, _, _, true, _, _) =>
+        case (true, _, _, true, _, _, _) =>
           throw new IllegalArgumentException(s"Bulk insert cannot support drop 
duplication." +
             s" Please disable $INSERT_DROP_DUPS and try again.")
         // if enableBulkInsert is true, use bulk insert for the insert 
overwrite non-partitioned table.
-        case (true, false, true, _, _, false) => BULK_INSERT_OPERATION_OPT_VAL
+        case (true, false, true, _, _, false, _) => 
BULK_INSERT_OPERATION_OPT_VAL
         // insert overwrite table
-        case (false, false, true, _, _, _) => 
INSERT_OVERWRITE_TABLE_OPERATION_OPT_VAL
+        case (false, false, true, _, _, _, _) => 
INSERT_OVERWRITE_TABLE_OPERATION_OPT_VAL
         // insert overwrite partition
-        case (_, true, false, _, _, true) => INSERT_OVERWRITE_OPERATION_OPT_VAL
+        case (_, true, false, _, _, true, _) => 
INSERT_OVERWRITE_OPERATION_OPT_VAL
         // disable dropDuplicate, and provide preCombineKey, use the upsert 
operation for strict and upsert mode.
-        case (false, false, false, false, false, _) if hasPrecombineColumn => 
UPSERT_OPERATION_OPT_VAL
+        case (false, false, false, false, false, _, _) if hasPrecombineColumn 
=> UPSERT_OPERATION_OPT_VAL
         // if table is pk table and has enableBulkInsert use bulk insert for 
non-strict mode.
-        case (true, _, _, _, true, _) => BULK_INSERT_OPERATION_OPT_VAL
+        case (true, _, _, _, true, _, _) => BULK_INSERT_OPERATION_OPT_VAL
+        // if auto record key generation is enabled, use bulk_insert
+        case (_, _, _, _, _, true, true) => BULK_INSERT_OPERATION_OPT_VAL

Review Comment:
   Why does it need to be a partitioned table? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] jonvex commented on a diff in pull request #8697: [HUDI-5514] Improving usability/performance with out of box default for append only use-cases

Reply via email to