andygrove commented on code in PR #1700:
URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2070420243
##########
spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala:
##########
@@ -188,6 +188,22 @@ class CometFuzzTestSuite extends CometTestBase with
AdaptiveSparkPlanHelper {
}
}
+ test("regexp_replace") {
+ withSQLConf(CometConf.COMET_REGEXP_ALLOW_INCOMPATIBLE.key -> "true") {
+ val df = spark.read.parquet(filename)
+ df.createOrReplaceTempView("t1")
+ // We want to make sure that the schema generator wasn't modified to
accidentally omit
+ // StringType, since then this test would not run any queries and
silently pass.
+ var testedString = false
+ for (field <- df.schema.fields if field.dataType == StringType) {
+ testedString = true
+ val sql = s"SELECT regexp_replace(${field.name}, 'a', 'b') FROM t1"
Review Comment:
Perhaps we should improve the logic in `ParquetGenerator` to also generate
ASCII strings:
```scala
case DataTypes.StringType =>
Range(0, numRows).map(_ => {
r.nextInt(10) match {
case 0 if options.allowNull => null
case 1 => r.nextInt().toByte.toString
case 2 => r.nextLong().toString
case 3 => r.nextDouble().toString
case _ => r.nextString(8)
}
})
```
The call to ` r.nextString(8)` generates strings containing characters in
the range 0 through 0xD800, so while it could theoretically generate `a`, it
will be rare.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]