alexeykudinkin commented on code in PR #7871:
URL: https://github.com/apache/hudi/pull/7871#discussion_r1104995855
##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##########
@@ -28,97 +28,125 @@ import
org.apache.hudi.config.HoodieWriteConfig.{AVRO_SCHEMA_VALIDATE_ENABLE, TB
import org.apache.hudi.exception.HoodieException
import org.apache.hudi.hive.HiveSyncConfigHolder
import org.apache.hudi.sync.common.HoodieSyncConfig
+import org.apache.hudi.util.JFunction.scalaFunction1Noop
import org.apache.hudi.{AvroConversionUtils, DataSourceWriteOptions,
HoodieSparkSqlWriter, SparkAdapterSupport}
-import org.apache.spark.sql.HoodieCatalystExpressionUtils.MatchCast
+import org.apache.spark.sql.HoodieCatalystExpressionUtils.{MatchCast,
attributeEquals}
import org.apache.spark.sql._
-import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.analysis.Resolver
import org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute,
AttributeReference, BoundReference, Cast, EqualTo, Expression, Literal}
+import org.apache.spark.sql.catalyst.expressions.BindReferences.bindReference
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute,
AttributeReference, BoundReference, EqualTo, Expression, Literal,
NamedExpression, PredicateHelper}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.hudi.HoodieSqlCommonUtils._
-import org.apache.spark.sql.hudi.HoodieSqlUtils.getMergeIntoTargetTableId
+import org.apache.spark.sql.hudi.analysis.HoodieAnalysis.failAnalysis
import org.apache.spark.sql.hudi.ProvidesHoodieConfig.combineOptions
-import
org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.CoercedAttributeReference
+import
org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.{CoercedAttributeReference,
encodeAsBase64String, stripCasting, toStructType}
import org.apache.spark.sql.hudi.command.payload.ExpressionPayload
import org.apache.spark.sql.hudi.command.payload.ExpressionPayload._
import org.apache.spark.sql.hudi.ProvidesHoodieConfig
-import org.apache.spark.sql.types.{BooleanType, StructType}
+import org.apache.spark.sql.types.{BooleanType, StructField, StructType}
import java.util.Base64
/**
- * The Command for hoodie MergeIntoTable.
- * The match on condition must contain the row key fields currently, so that
we can use Hoodie
- * Index to speed up the performance.
+ * Hudi's implementation of the {@code MERGE INTO} (MIT) Spark SQL statement.
*
- * The main algorithm:
+ * NOTE: That this implementation is restricted in a some aspects to
accommodate for Hudi's crucial
+ * constraint (of requiring every record to bear unique primary-key):
merging condition ([[mergeCondition]])
+ * is currently can only (and must) reference target table's primary-key
columns (this is necessary to
+ * leverage Hudi's upserting capabilities including Indexes)
*
- * We pushed down all the matched and not matched (condition, assignment)
expression pairs to the
- * ExpressionPayload. And the matched (condition, assignment) expression pairs
will execute in the
- * ExpressionPayload#combineAndGetUpdateValue to compute the result record,
while the not matched
- * expression pairs will execute in the ExpressionPayload#getInsertValue.
+ * Following algorithm is applied:
*
- * For Mor table, it is a litter complex than this. The matched record also
goes through the getInsertValue
- * and write append to the log. So the update actions & insert actions should
process by the same
- * way. We pushed all the update actions & insert actions together to the
- * ExpressionPayload#getInsertValue.
+ * <ol>
+ * <li>Incoming batch ([[sourceTable]]) is reshaped such that it bears
correspondingly:
+ * a) (required) "primary-key" column as well as b) (optional) "pre-combine"
column; this is
+ * required since MIT statements does not restrict [[sourceTable]]s schema
to be aligned w/ the
+ * [[targetTable]]s one, while Hudi's upserting flow expects such columns to
be present</li>
Review Comment:
MIT poses no restrictions on your source and target tables' schemas to be
affined in any way, in other words anything can be merged into the target table
as far as the fields we're updating do have matching data-types.
This is actually exactly the reason why schema validation is disabled for MIT
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]