Eric Yang created SPARK-57500:
---------------------------------
Summary: MySQL JDBC pushdown returns wrong results for =, <>, <,
IN on string values containing a backslash
Key: SPARK-57500
URL: https://issues.apache.org/jira/browse/SPARK-57500
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 5.0.0
Reporter: Eric Yang
String-comparison predicate pushdown inlines the literal using quote-only
escaping (`JdbcDialect.escapeSql` doubles ' but not \). MySQL treats \ as an
escape character inside string literals, so a pushed-down literal containing a
backslash is mis-parsed and silently matches the wrong value.
For example: WHERE c = 'a\b' is pushed to MySQL as `c` = 'a\b' , which MySQL
parses as `c` = 'a<backspace>'. The row whose value is literally a\b is dropped
(wrong results). Common triggers: Windows paths (C:\...), regex/JSON strings.
This is the comparison/IN sibling of SPARK-57332, which fixed the same class
only for the LIKE family (STARTS_WITH/ENDS_WITH/CONTAINS). The general literal
path (compileValue/escapeSql) was left unescaped, so =, <>, <, <=, >, >=, IN
are still wrong.
Affects MySQL (and MariaDB via MySQLDialect). Standard-SQL dialects are
unaffected (backslash is literal there).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]