[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

ASF GitHub Bot (Jira) Mon, 26 Jul 2021 06:53:04 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387360#comment-17387360
 ]


ASF GitHub Bot commented on HUDI-2208:
--------------------------------------

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r676621769



##########
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##########
@@ -247,6 +247,13 @@ object DataSourceWriteOptions {
     .defaultValue("false")
     .withDocumentation("When set to true, will perform write operations 
directly using the spark native " +
       "`Row` representation, avoiding any additional conversion costs.")
+  /**
+   * Enable the bulk insert for sql insert statement.
+   */
+  val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty
+    .key("hoodie.sq.bulk.insert.enable")

Review comment:
       Good catch!

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##########
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
         RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
         PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
         PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+        ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,

Review comment:
       Currently we already have the ENABLE_ROW_WRITER_OPT_KEY to config the 
Row write. The enableBulkInsert is only used to control the row write for bulk 
insert. 

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##########
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
         RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
         PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
         PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+        ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,
+        HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> 
isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine

Review comment:
       Because we must guarantee the  uniqueness for p-k table. So we should 
combine the input for pk-table.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Support Bulk Insert For Spark Sql
> ---------------------------------
>
>                 Key: HUDI-2208
>                 URL: https://issues.apache.org/jira/browse/HUDI-2208
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: pengzhiwei
>            Assignee: pengzhiwei
>            Priority: Major
>              Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

Reply via email to