[jira] [Commented] (HUDI-2208) [SQL] Support Bulk Insert For Spark Sql

ASF GitHub Bot (Jira) Mon, 02 Aug 2021 15:04:06 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391810#comment-17391810
 ]


ASF GitHub Bot commented on HUDI-2208:
--------------------------------------

nsivabalan commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r680197209



##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##########
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
       .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
       .toBoolean
 
-    val operation = if (isOverwrite) {
-      if (table.partitionColumnNames.nonEmpty) {
-        INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-      } else {
-        INSERT_OPERATION_OPT_VAL
+    val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+      DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+    val isPartitionedTable = table.partitionColumnNames.nonEmpty
+    val isPrimaryKeyTable = primaryColumns.nonEmpty
+    val operation =
+      (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+        case (true, true, _, _) =>
+          throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")

Review comment:
       anyways, we can call it out that its responsibility of the user to 
ensure there are uniqueness. Also, IIUC, hudi can handle duplicates. Incase of 
updates, both records will be updated. but bulk_insert is very performant 
compared to regular Insert especially w/ row wirter. So, we should not keep it 
too restrictive for use. I know from the community msgs, that lot of users 
leverage bulk_insert. I would vote to relax this constraint. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> [SQL] Support Bulk Insert For Spark Sql
> ---------------------------------------
>
>                 Key: HUDI-2208
>                 URL: https://issues.apache.org/jira/browse/HUDI-2208
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: pengzhiwei
>            Assignee: pengzhiwei
>            Priority: Blocker
>              Labels: pull-request-available, release-blocker
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) [SQL] Support Bulk Insert For Spark Sql

Reply via email to