[ https://issues.apache.org/jira/browse/SPARK-39107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532462#comment-17532462 ]
Apache Spark commented on SPARK-39107: -------------------------------------- User 'LorenzoMartini' has created a pull request for this issue: https://github.com/apache/spark/pull/36457 > Silent change in regexp_replace's handling of empty strings > ----------------------------------------------------------- > > Key: SPARK-39107 > URL: https://issues.apache.org/jira/browse/SPARK-39107 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.2 > Reporter: Willi Raschkowski > Priority: Major > Labels: correctness > > Hi, we just upgraded from 3.0.2 to 3.1.2 and noticed a silent behavior change > that a) seems incorrect, and b) is undocumented in the [migration > guide|https://spark.apache.org/docs/latest/sql-migration-guide.html]: > {code:title=3.0.2} > scala> val df = spark.sql("SELECT '' AS col") > df: org.apache.spark.sql.DataFrame = [col: string] > scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", > "<empty>")).show > +---+--------+ > |col|replaced| > +---+--------+ > | | <empty>| > +---+--------+ > {code} > {code:title=3.1.2} > scala> val df = spark.sql("SELECT '' AS col") > df: org.apache.spark.sql.DataFrame = [col: string] > scala> df.withColumn("replaced", regexp_replace(col("col"), "^$", > "<empty>")).show > +---+--------+ > |col|replaced| > +---+--------+ > | | | > +---+--------+ > {code} > Note, the regular expression {{^$}} should match the empty string, but > doesn't in version 3.1. E.g. this is the Java behavior: > {code} > scala> "".replaceAll("^$", "<empty>"); > res1: String = <empty> > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org