Re: [PR] fix: route StringReplace through codegen dispatcher to fix empty-search divergence [datafusion-comet]

via GitHub Tue, 02 Jun 2026 06:50:46 -0700


mbutrovich commented on code in PR #4537:
URL: https://github.com/apache/datafusion-comet/pull/4537#discussion_r3341674022



##########
spark/src/main/scala/org/apache/comet/serde/strings.scala:
##########
@@ -102,6 +102,27 @@ object CometInitCap extends 
CometScalarFunction[InitCap]("initcap") {
   }
 }
 
+object CometStringReplace extends 
CometScalarFunction[StringReplace]("replace") {
+
+  override def getSupportLevel(expr: StringReplace): SupportLevel = 
Compatible()
+
+  override def convert(
+      expr: StringReplace,
+      inputs: Seq[Attribute],
+      binding: Boolean): Option[Expr] = {
+    if (CometConf.isExprAllowIncompat(getExprConfigName(expr))) {
+      // Native path: faster, but DataFusion's `replace` diverges from Spark 
when the search
+      // string is empty — it inserts the replacement between every character 
and at both
+      // boundaries, while Spark short-circuits and returns the source 
unchanged. Opt-in.
+      super.convert(expr, inputs, binding)
+    } else {
+      // Default: route through the codegen dispatcher so Spark's own 
doGenCode handles the
+      // empty-search edge case. 
https://github.com/apache/datafusion-comet/issues/4497

Review Comment:
   Do we want to link to closed issues in the code (this PR closes it). I think 
we should make sure the code is self-documenting, with comments explaining why 
code exists if it's non-obvious. Also comments that say things like "default" 
are a code smell if that ever changes.



##########
spark/src/main/scala/org/apache/comet/serde/strings.scala:
##########
@@ -102,6 +102,27 @@ object CometInitCap extends 
CometScalarFunction[InitCap]("initcap") {
   }
 }
 
+object CometStringReplace extends 
CometScalarFunction[StringReplace]("replace") {
+
+  override def getSupportLevel(expr: StringReplace): SupportLevel = 
Compatible()
+
+  override def convert(
+      expr: StringReplace,
+      inputs: Seq[Attribute],
+      binding: Boolean): Option[Expr] = {
+    if (CometConf.isExprAllowIncompat(getExprConfigName(expr))) {
+      // Native path: faster, but DataFusion's `replace` diverges from Spark 
when the search

Review Comment:
   Is this already documented somewhere else? This doesn't seem like the right 
place for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: route StringReplace through codegen dispatcher to fix empty-search divergence [datafusion-comet]

Reply via email to