[
https://issues.apache.org/jira/browse/SPARK-39241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540601#comment-17540601
]
Yuming Wang commented on SPARK-39241:
-------------------------------------
I can't reproduce this issue:
{code:scala}
spark.sql(
"""
| CREATE EXTERNAL TABLE tmp( f1 STRING) PARTITIONED BY (dt STRING) ROW
FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS
TEXTFILE LOCATION 'file://tmp/tmp/'
""".stripMargin)
spark.sql("""insert into table tmp partition(dt="2022051000")
values("1")""")
spark.sql("select * from tmp where dt like '202205100%'").show()
spark.sql("select * from tmp where dt like any ('202205100%')").show()
{code}
> Spark SQL 'Like' operator behaves wrongly while filtering on partitioned
> column after Spark 3.1
> -----------------------------------------------------------------------------------------------
>
> Key: SPARK-39241
> URL: https://issues.apache.org/jira/browse/SPARK-39241
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.2
> Environment: *Environment: EMR*
> Release label:emr-6.5.0
> Hadoop distribution:Amazon 3.2.1
> Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1
> Reporter: Dmitry Gorbatsevich
> Priority: Major
>
> It seems like introduction of "like any" in spark 3.1 breaks "like" behaviour
> when filtering on partitioned column. Here is the example:
> 1. Create test table:
> {code:java}
> scala> spark.sql(
> | """
> | CREATE EXTERNAL TABLE tmp(
> | f1 STRING
> | )
> | PARTITIONED BY (dt STRING)
> | ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> | LINES TERMINATED BY '\n'
> | STORED AS TEXTFILE
> | LOCATION 's3://vlg-data-us-east-1/tmp/tmp/';
> | """)
> res2: org.apache.spark.sql.DataFrame = []{code}
> 2. insert something there:
> {code:java}
> scala> spark.sql(
> | """
> | insert into table tmp partition(dt="2022051000") values("1")
> | """
> | )
> res3: org.apache.spark.sql.DataFrame = [] {code}
> 3. Do select using 'like':
> {code:java}
> scala> spark.sql(
> | """
> | select * from tmp
> | where dt like '202205100%'
> | """
> | ).show()
> +---+---+
> | f1| dt|
> +---+---+
> +---+---+ {code}
> 4. Do select using 'like any':
> {code:java}
> scala> spark.sql(
> | """
> | select * from tmp
> | where dt like any ('202205100%')
> | """
> | ).show()
> 22/05/20 14:50:26 WARN HiveConf: HiveConf of name hive.server2.thrift.url
> does not exist
> +---+----------+
> | f1| dt|
> +---+----------+
> | 1|2022051000|
> +---+----------+ {code}
> Expectation is that results 3 and 4 are identical, however this is not the
> case and result #3 is obviously wrong.
>
> *Environment: EMR*
> Release label:emr-6.5.0
> Hadoop distribution:Amazon 3.2.1
> Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]