Eric Yang created SPARK-57500:
---------------------------------

             Summary: MySQL JDBC pushdown returns wrong results for =, <>, <, 
IN on string values containing a backslash
                 Key: SPARK-57500
                 URL: https://issues.apache.org/jira/browse/SPARK-57500
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 5.0.0
            Reporter: Eric Yang


String-comparison predicate pushdown inlines the literal using quote-only 
escaping (`JdbcDialect.escapeSql` doubles ' but not \). MySQL treats \ as an 
escape character inside string literals, so a pushed-down literal containing a 
backslash is mis-parsed and silently matches the wrong value.
 
For example: WHERE c = 'a\b' is pushed to MySQL as `c` = 'a\b' , which MySQL 
parses as `c` = 'a<backspace>'. The row whose value is literally a\b is dropped 
(wrong results). Common triggers: Windows paths (C:\...), regex/JSON strings.
 
This is the comparison/IN sibling of SPARK-57332, which fixed the same class 
only for the LIKE family (STARTS_WITH/ENDS_WITH/CONTAINS). The general literal 
path (compileValue/escapeSql) was left unescaped, so =, <>, <, <=, >, >=, IN 
are still wrong.
 
Affects MySQL (and MariaDB via MySQLDialect). Standard-SQL dialects are 
unaffected (backslash is literal there).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to