Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

via GitHub Thu, 01 May 2025 08:47:24 -0700


andygrove commented on code in PR #1700:
URL: https://github.com/apache/datafusion-comet/pull/1700#discussion_r2070420243



##########
spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala:
##########
@@ -188,6 +188,22 @@ class CometFuzzTestSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
     }
   }
 
+  test("regexp_replace") {
+    withSQLConf(CometConf.COMET_REGEXP_ALLOW_INCOMPATIBLE.key -> "true") {
+      val df = spark.read.parquet(filename)
+      df.createOrReplaceTempView("t1")
+      // We want to make sure that the schema generator wasn't modified to 
accidentally omit
+      // StringType, since then this test would not run any queries and 
silently pass.
+      var testedString = false
+      for (field <- df.schema.fields if field.dataType == StringType) {
+        testedString = true
+        val sql = s"SELECT regexp_replace(${field.name}, 'a', 'b') FROM t1"

Review Comment:
   Perhaps we should improve the logic in `ParquetGenerator` to also generate 
ASCII strings:
   
   ```scala
         case DataTypes.StringType =>
           Range(0, numRows).map(_ => {
             r.nextInt(10) match {
               case 0 if options.allowNull => null
               case 1 => r.nextInt().toByte.toString
               case 2 => r.nextLong().toString
               case 3 => r.nextDouble().toString
               case _ => r.nextString(8)
             }
           })
   ```
   
   The call to ` r.nextString(8)` generates strings containing characters in 
the range 0 through 0xD800, so while it could theoretically generate `a`, it 
will be rare.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: regexp_replace() expression with no starting offset [datafusion-comet]

Reply via email to