[ https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391810#comment-17391810 ]
ASF GitHub Bot commented on HUDI-2208: -------------------------------------- nsivabalan commented on a change in pull request #3328: URL: https://github.com/apache/hudi/pull/3328#discussion_r680197209 ########## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala ########## @@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand { .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue) .toBoolean - val operation = if (isOverwrite) { - if (table.partitionColumnNames.nonEmpty) { - INSERT_OVERWRITE_OPERATION_OPT_VAL // overwrite partition - } else { - INSERT_OPERATION_OPT_VAL + val enableBulkInsert = parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key, + DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean + val isPartitionedTable = table.partitionColumnNames.nonEmpty + val isPrimaryKeyTable = primaryColumns.nonEmpty + val operation = + (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match { + case (true, true, _, _) => + throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert.") Review comment: anyways, we can call it out that its responsibility of the user to ensure there are uniqueness. Also, IIUC, hudi can handle duplicates. Incase of updates, both records will be updated. but bulk_insert is very performant compared to regular Insert especially w/ row wirter. So, we should not keep it too restrictive for use. I know from the community msgs, that lot of users leverage bulk_insert. I would vote to relax this constraint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [SQL] Support Bulk Insert For Spark Sql > --------------------------------------- > > Key: HUDI-2208 > URL: https://issues.apache.org/jira/browse/HUDI-2208 > Project: Apache Hudi > Issue Type: Sub-task > Reporter: pengzhiwei > Assignee: pengzhiwei > Priority: Blocker > Labels: pull-request-available, release-blocker > > Support the bulk insert for spark sql -- This message was sent by Atlassian Jira (v8.3.4#803005)