[
https://issues.apache.org/jira/browse/SPARK-37344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444225#comment-17444225
]
angerszhu edited comment on SPARK-37344 at 11/16/21, 8:05 AM:
--------------------------------------------------------------
for same SQL
{code}
explain extended select split('dawdawdawd','\\\\;');
{code}
In hive 1.2
{code}
OK
ABSTRACT SYNTAX TREE:
TOK_QUERY
TOK_INSERT
TOK_DESTINATION
TOK_DIR
TOK_TMP_FILE
TOK_SELECT
TOK_SELEXPR
TOK_FUNCTION
split
'dawdawdawd'
'\\\;'
{code}
In hive 3
{code}
OK
ABSTRACT SYNTAX TREE:
TOK_QUERY
TOK_INSERT
TOK_DESTINATION
TOK_DIR
TOK_TMP_FILE
TOK_SELECT
TOK_SELEXPR
TOK_FUNCTION
split
'dawdawdawd'
'\\\\;'
{code}
was (Author: angerszhuuu):
In latest master branch
{code}
== Parsed Logical Plan ==
'Project [unresolvedalias('split('name, \;), None)]
+- 'UnresolvedRelation [split_test], [], false
== Analyzed Logical Plan ==
split(name, \;, -1): array<string>
Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- SubqueryAlias spark_catalog.default.split_test
+- Relation default.split_test[id#224,name#225] parquet
== Optimized Logical Plan ==
Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- Relation default.split_test[id#224,name#225] parquet
== Physical Plan ==
*(1) Project [split(name#225, \;, -1) AS split(name, \;, -1)#226]
+- *(1) ColumnarToRow
+- FileScan parquet default.split_test[name#225] Batched: true, DataFilters:
[], Format: Parquet, Location: InMemoryFileIndex(1
paths)[file:/Users/yi.zhu/Documents/project/Angerszhuuuu/spark/sql/core/spark...,
PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string>
{code}
> split function behave differently between spark 2.3 and spark 3.2
> -----------------------------------------------------------------
>
> Key: SPARK-37344
> URL: https://issues.apache.org/jira/browse/SPARK-37344
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.1, 3.1.2, 3.2.0
> Reporter: ocean
> Priority: Major
> Labels: incorrect
>
> while use split function in sql, it behave differently between 2.3 and 3.2,
> which cause incorrect problem.
> we can use this sql to reproduce this problem:
>
> create table split_test ( id int,name string)
> insert into split_test values(1,"abc;def")
> explain extended select split(name,'\\\\;') from split_test
>
> spark3:
> spark-sql> Explain extended select split(name,'\\\\;') from split_test;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('split('name, \\;), None)]
> +- 'UnresolvedRelation [split_test], [], false
>
> spark2:
>
> spark-sql> Explain extended select split(name,'\\\\;') from split_test;
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('split('name, \;), None)]
> +- 'UnresolvedRelation split_test
>
> It looks like the deal of escape is different
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]