Noah Kawasaki created SPARK-34102:
-------------------------------------
Summary: Spark SQL cannot escape both \ and other special
characters
Key: SPARK-34102
URL: https://issues.apache.org/jira/browse/SPARK-34102
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.0.1, 2.4.5, 2.3.0, 2.2.2, 2.1.3, 2.0.2
Reporter: Noah Kawasaki
Spark literal string parsing does not properly escape backslashes or other
special characters. This is an extension of this issue:
https://issues.apache.org/jira/browse/SPARK-17647#
The issue is that depending on how spark.sql.parser.escapedStringLiterals is
set, you will either be able to correctly get escaped backslashes in a string
literal, but not escaped other special characters, OR, you can have correctly
escaped other special characters, but not correctly escaped backslashes.
So you have to choose which configuration you care about more.
I have tested Spark versions 2.1, 2.2, 2.3, 2.4, and 3.0 and they all
experience the issue:
{code:java}
# These do not return the expected backslash
SET spark.sql.parser.escapedStringLiterals=false;
SELECT '\\';
> \
(should return \\)
SELECT 'hi\hi';
> hihi
(should return hi\hi)
# These are correctly escaped
SELECT '\"';
> "
SELECT '\'';
> '{code}
If I switch this:
{code:java}
# These now work
SET spark.sql.parser.escapedStringLiterals=true;
SELECT '\\';
> \\
SELECT 'hi\hi';
> hi\hi
# These are now not correctly escaped
SELECT '\"';
> \"
(should return ")
SELECT '\'';
> \'
(should return ' ){code}
So basically we have to choose:
SET spark.sql.parser.escapedStringLiterals=false; if we want backslashes
correctly escaped but not other special characters
SET spark.sql.parser.escapedStringLiterals=true; if we want other special
characters correctly escaped but not backslashes
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]