[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

ASF GitHub Bot (Jira) Thu, 22 Jul 2021 07:50:04 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385570#comment-17385570
 ]


ASF GitHub Bot commented on HUDI-2208:
--------------------------------------

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r674875373



##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala
##########
@@ -172,6 +178,15 @@ object HoodieOptionConfig {
     params.get(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key)
   }
 
+  /**
+   * Whether enable the bulk insert for sql insert statement when there is no 
primaryKey in the table.
+   */
+  def enableBulkInsert(options: Map[String, String]): Boolean = {

Review comment:
       I saw that currently ENABLE_ROW_WRITER_OPT_KEY is only used for bulk 
insert,  so i reused this config.

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
           // Convert to RDD[HoodieRecord]
           val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-          val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+          val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+            operation.equals(WriteOperationType.UPSERT) ||
+            
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
       Yes,  if we have enable the COMBINE_BEFORE_INSERT_PROP for insert, it 
has not compute the pre combine field value which will result incorrect result 
for insert with duplicate records.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Support Bulk Insert For Spark Sql
> ---------------------------------
>
>                 Key: HUDI-2208
>                 URL: https://issues.apache.org/jira/browse/HUDI-2208
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: pengzhiwei
>            Assignee: pengzhiwei
>            Priority: Major
>              Labels: pull-request-available
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2208) Support Bulk Insert For Spark Sql

Reply via email to