jonvex commented on code in PR #8875:
URL: https://github.com/apache/hudi/pull/8875#discussion_r1218286181
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -89,6 +89,62 @@ trait ProvidesHoodieConfig extends Logging {
defaultOpts = defaultOpts, overridingOpts = overridingOpts)
}
+ /**
+ * Get the insert operation.
+ * See if we are able to set bulk insert, else use deduceOperation
+ */
+ private def getOperation(isPartitionedTable: Boolean,
+ isOverwritePartition: Boolean,
+ isOverwriteTable: Boolean,
+ insertModeSet: Boolean,
+ dropDuplicate: Option[String],
+ enableBulkInsert: Option[String],
+ isInsertInto: Boolean,
+ isNonStrictMode: Boolean,
+ hasPrecombineColumn: Boolean): String = {
+ val notSetToNonStrict = !insertModeSet || isNonStrictMode
+ //if options are not set, we assume they are configs to do bulk insert
+ (isInsertInto, notSetToNonStrict, enableBulkInsert.getOrElse("true"),
+ dropDuplicate.getOrElse("false"), isOverwritePartition,
isPartitionedTable) match {
+ case (true, true, "true", "false", false, _) =>
BULK_INSERT_OPERATION_OPT_VAL
Review Comment:
Consider the case where the user just sets "hoodie.sql.bulk.insert.enable".
In your suggestion, we would not end up using bulk insert mode, because the
default of "hoodie.sql.insert.mode" is "upsert". Considering that the
documentation for the config is "When set to true, the sql insert statement
will use bulk insert.", I think that the user is trying to use bulk insert. The
way I made it work is that we assume the user wants to use bulk insert until a
config is set that is incompatible with bulk insert. In that situation, we then
fallback on the original logic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]