yihua commented on code in PR #12961:
URL: https://github.com/apache/hudi/pull/12961#discussion_r1996096804


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DeleteHoodieTableCommand.scala:
##########
@@ -41,9 +43,28 @@ case class DeleteHoodieTableCommand(dft: DeleteFromTable) 
extends HoodieLeafRunn
 
     val condition = sparkAdapter.extractDeleteCondition(dft)
 
+    val config = if 
(sparkSession.sqlContext.conf.getConfString(SPARK_SQL_OPTIMIZED_WRITES.key()
+      , SPARK_SQL_OPTIMIZED_WRITES.defaultValue()) == "true") {
+      buildHoodieDeleteTableConfig(catalogTable, sparkSession) + 
(SPARK_SQL_WRITES_PREPPED_KEY -> "true")
+    } else {
+      buildHoodieDeleteTableConfig(catalogTable, sparkSession)
+    }
+
+    val recordKeysStr = 
config.getOrElse(HoodieTableConfig.RECORDKEY_FIELDS.key(), "")
+    val recordKeys = recordKeysStr.split(",").filter(_.nonEmpty)
+
+    // get all columns which are used in condition
+    val conditionColumns = if (condition == null) {
+      Seq.empty[String]
+    } else {
+      condition.references.map(_.name).toSeq
+    }
+
+    val requiredCols = recordKeys ++ conditionColumns

Review Comment:
   Do we need the partition path data column(s) here too (or 
`_hoodie_partition_path` suffices)? Have you verified that deletes work 
properly on a partitioned table?
   
   Another nuances is, Spark `DELETE` statement does not use precombine field 
so it is fine to ignore the field now.  However, if we want to support 
streaming deletes, i.e., based on event time in precombine field, the 
precombine field needs to be added in this case.
   
   Would be good to add javadocs/comments to clarify these.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to