coderfender commented on code in PR #2831:
URL: https://github.com/apache/datafusion-comet/pull/2831#discussion_r2666346598
##########
spark/src/main/scala/org/apache/comet/serde/strings.scala:
##########
@@ -286,3 +286,81 @@ trait CommonStringExprs {
}
}
}
+
+object CometRegExpExtract extends CometExpressionSerde[RegExpExtract] {
+ override def getSupportLevel(expr: RegExpExtract): SupportLevel = {
+ // Check if the pattern is compatible with Spark or allow incompatible
patterns
+ expr.regexp match {
+ case Literal(pattern, DataTypes.StringType) =>
+ if (!RegExp.isSupportedPattern(pattern.toString) &&
+ !CometConf.COMET_REGEXP_ALLOW_INCOMPATIBLE.get()) {
+ withInfo(
+ expr,
+ s"Regexp pattern $pattern is not compatible with Spark. " +
+ s"Set ${CometConf.COMET_REGEXP_ALLOW_INCOMPATIBLE.key}=true " +
+ "to allow it anyway.")
+ return Incompatible()
+ }
+ case _ =>
+ return Unsupported(Some("Only literal regexp patterns are supported"))
Review Comment:
minor / nit / nice to have : reg exp could be made regular expression /
regex (should be more useful for non-native english speakers)
##########
spark/src/main/scala/org/apache/comet/serde/strings.scala:
##########
@@ -286,3 +286,83 @@ trait CommonStringExprs {
}
}
}
+
+object CometRegExpExtract extends CometExpressionSerde[RegExpExtract] {
+ override def getSupportLevel(expr: RegExpExtract): SupportLevel = {
+ // Check if the pattern is compatible with Spark or allow incompatible
patterns
+ expr.regexp match {
+ case Literal(pattern, DataTypes.StringType) =>
+ if (!RegExp.isSupportedPattern(pattern.toString) &&
+ !CometConf.COMET_REGEXP_ALLOW_INCOMPATIBLE.get()) {
+ withInfo(
+ expr,
+ s"Regexp pattern $pattern is not compatible with Spark. " +
+ s"Set ${CometConf.COMET_REGEXP_ALLOW_INCOMPATIBLE.key}=true " +
+ "to allow it anyway.")
+ return Incompatible()
+ }
+ case _ =>
+ return Unsupported(Some("Only literal regexp patterns are supported"))
+ }
+
+ // Check if idx is a literal
+ expr.idx match {
+ case Literal(_, DataTypes.IntegerType) =>
+ Compatible()
+ case _ =>
+ Unsupported(Some("Only literal group index is supported"))
Review Comment:
thank you
##########
docs/source/user-guide/latest/configs.md:
##########
@@ -294,6 +294,8 @@ These settings can be used to determine which parts of the
plan are accelerated
| `spark.comet.expression.RLike.enabled` | Enable Comet acceleration for
`RLike` | true |
| `spark.comet.expression.Rand.enabled` | Enable Comet acceleration for `Rand`
| true |
| `spark.comet.expression.Randn.enabled` | Enable Comet acceleration for
`Randn` | true |
+| `spark.comet.expression.RegExpExtract.enabled` | Enable Comet acceleration
for `RegExpExtract` | true |
+| `spark.comet.expression.RegExpExtractAll.enabled` | Enable Comet
acceleration for `RegExpExtractAll` | true |
Review Comment:
Perhaps the default configs can be false ? (given that
COMET_REGEXP_ALLOW_INCOMPATIBLE is defaulted to false)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]