[
https://issues.apache.org/jira/browse/SPARK-57500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Yang updated SPARK-57500:
------------------------------
Description:
String-comparison predicate pushdown inlines the literal using quote-only
escaping (`JdbcDialect.escapeSql` doubles ' but not ). MySQL treats \ as an
escape character inside string literals, so a pushed-down literal containing a
backslash is mis-parsed and silently matches the wrong value.
For example: WHERE c = 'a\b' is pushed to MySQL as `c` = 'a\b' , which MySQL
parses as `c` = 'a<backspace>'. The row whose value is literally a\b is dropped
(wrong results). Common triggers: Windows paths (C:\...), regex/JSON strings.
This is the comparison/IN sibling of SPARK-57332 and
https://issues.apache.org/jira/browse/SPARK-57287 , which fixed the same class
only for the LIKE family (STARTS_WITH/ENDS_WITH/CONTAINS). The general literal
path (compileValue/escapeSql) was left unescaped, so =, <>, <, <=, >, >=, IN
are still wrong.
Affects MySQL (and MariaDB via MySQLDialect). Standard-SQL dialects are
unaffected (backslash is literal there).
was:
String-comparison predicate pushdown inlines the literal using quote-only
escaping (`JdbcDialect.escapeSql` doubles ' but not \). MySQL treats \ as an
escape character inside string literals, so a pushed-down literal containing a
backslash is mis-parsed and silently matches the wrong value.
For example: WHERE c = 'a\b' is pushed to MySQL as `c` = 'a\b' , which MySQL
parses as `c` = 'a<backspace>'. The row whose value is literally a\b is dropped
(wrong results). Common triggers: Windows paths (C:\...), regex/JSON strings.
This is the comparison/IN sibling of SPARK-57332, which fixed the same class
only for the LIKE family (STARTS_WITH/ENDS_WITH/CONTAINS). The general literal
path (compileValue/escapeSql) was left unescaped, so =, <>, <, <=, >, >=, IN
are still wrong.
Affects MySQL (and MariaDB via MySQLDialect). Standard-SQL dialects are
unaffected (backslash is literal there).
> MySQL JDBC pushdown returns wrong results for =, <>, <, IN on string values
> containing a backslash
> --------------------------------------------------------------------------------------------------
>
> Key: SPARK-57500
> URL: https://issues.apache.org/jira/browse/SPARK-57500
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Eric Yang
> Priority: Major
>
> String-comparison predicate pushdown inlines the literal using quote-only
> escaping (`JdbcDialect.escapeSql` doubles ' but not ). MySQL treats \ as an
> escape character inside string literals, so a pushed-down literal containing
> a backslash is mis-parsed and silently matches the wrong value.
>
> For example: WHERE c = 'a\b' is pushed to MySQL as `c` = 'a\b' , which MySQL
> parses as `c` = 'a<backspace>'. The row whose value is literally a\b is
> dropped (wrong results). Common triggers: Windows paths (C:\...), regex/JSON
> strings.
>
> This is the comparison/IN sibling of SPARK-57332 and
> https://issues.apache.org/jira/browse/SPARK-57287 , which fixed the same
> class only for the LIKE family (STARTS_WITH/ENDS_WITH/CONTAINS). The general
> literal path (compileValue/escapeSql) was left unescaped, so =, <>, <, <=, >,
> >=, IN are still wrong.
>
> Affects MySQL (and MariaDB via MySQLDialect). Standard-SQL dialects are
> unaffected (backslash is literal there).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]