[jira] [Commented] (SPARK-39241) Spark SQL 'Like' operator behaves wrongly while filtering on partitioned column after Spark 3.1

Yuming Wang (Jira) Sun, 22 May 2022 06:50:08 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-39241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540601#comment-17540601
 ]


Yuming Wang commented on SPARK-39241:
-------------------------------------

I can't reproduce this issue:

{code:scala}
    spark.sql(
      """
        | CREATE EXTERNAL TABLE tmp( f1 STRING) PARTITIONED BY (dt STRING) ROW 
FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS 
TEXTFILE  LOCATION 'file://tmp/tmp/'
      """.stripMargin)

    spark.sql("""insert into table tmp partition(dt="2022051000") 
values("1")""")

    spark.sql("select * from tmp where dt like '202205100%'").show()

    spark.sql("select * from tmp where dt like any ('202205100%')").show()
{code}


> Spark SQL 'Like' operator behaves wrongly while filtering on partitioned 
> column after Spark 3.1
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39241
>                 URL: https://issues.apache.org/jira/browse/SPARK-39241
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2
>         Environment: *Environment: EMR*
> Release label:emr-6.5.0
> Hadoop distribution:Amazon 3.2.1
> Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1
>            Reporter: Dmitry Gorbatsevich
>            Priority: Major
>
> It seems like introduction of "like any" in spark 3.1 breaks "like" behaviour 
> when filtering on partitioned column. Here is the example:
> 1. Create test table:
> {code:java}
> scala> spark.sql(
>      | """
>      | CREATE EXTERNAL TABLE tmp(
>      |         f1 STRING
>      |     )
>      |     PARTITIONED BY (dt STRING)
>      |     ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>      |     LINES TERMINATED BY '\n'
>      |     STORED AS TEXTFILE
>      |     LOCATION 's3://vlg-data-us-east-1/tmp/tmp/';
>      | """) 
> res2: org.apache.spark.sql.DataFrame = []{code}
> 2. insert something there:
> {code:java}
> scala> spark.sql(
>      | """
>      |     insert into table tmp partition(dt="2022051000") values("1")
>      | """
>      | )
> res3: org.apache.spark.sql.DataFrame = [] {code}
> 3. Do select using 'like':
> {code:java}
> scala> spark.sql(
>      |     """
>      |         select * from tmp
>      |         where dt like '202205100%'
>      |     """
>      |     ).show()
> +---+---+
> | f1| dt|
> +---+---+
> +---+---+ {code}
> 4. Do select using 'like any':
> {code:java}
> scala> spark.sql(
>      |     """
>      |         select * from tmp
>      |         where dt like any ('202205100%')
>      |     """
>      |     ).show()
> 22/05/20 14:50:26 WARN HiveConf: HiveConf of name hive.server2.thrift.url 
> does not exist
> +---+----------+
> | f1|        dt|
> +---+----------+
> |  1|2022051000|
> +---+----------+ {code}
> Expectation is that results 3 and 4 are identical, however this is not the 
> case and result #3 is obviously wrong. 
>  
> *Environment: EMR*
> Release label:emr-6.5.0
> Hadoop distribution:Amazon 3.2.1
> Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-39241) Spark SQL 'Like' operator behaves wrongly while filtering on partitioned column after Spark 3.1

Reply via email to