[ 
https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33098:
------------------------------------

    Assignee: Apache Spark

> Exception when using 'in' to compare a partition column to a literal with the 
> wrong type
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-33098
>                 URL: https://issues.apache.org/jira/browse/SPARK-33098
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Bruce Robbins
>            Assignee: Apache Spark
>            Priority: Major
>
> Comparing a partition column against a literal with the wrong type works if 
> you use equality ('='). However, if you use 'in', you get:
> {noformat}
> MetaException(message:Filtering is supported only on partition keys of type 
> string)
> {noformat}
> For example:
> {noformat}
> spark-sql> create table test (a int) partitioned by (b int) stored as parquet;
> Time taken: 0.323 seconds
> spark-sql> insert into test values (1, 1), (1, 2), (2, 2);
> 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test
> 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test
> 20/10/08 19:57:14 WARN log: Updated size to 418
> 20/10/08 19:57:14 WARN log: Updated size to 836
> Time taken: 2.124 seconds
> spark-sql> -- this works, of course
> spark-sql> select * from test where b in (2);
> 1     2
> 2     2
> Time taken: 0.13 seconds, Fetched 2 row(s)
> spark-sql> -- this also works (equals with wrong type)
> spark-sql> select * from test where b = '2';
> 1     2
> 2     2
> Time taken: 0.132 seconds, Fetched 2 row(s)
> spark-sql> -- this does not work ('in' with wrong type)
> spark-sql> select * from test where b in ('2');
> 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b 
> in ('2')]
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>       at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
> -
> -
> -
> Caused by: MetaException(message:Filtering is supported only on partition 
> keys of type string)
> {noformat}
> There are also interesting variations of this using the dataframe API:
> {noformat}
> scala> sql("select cast(b as string) as b from test where b in 
> (2)").show(false)
> +---+
> |b  |
> +---+
> |2  |
> |2  |
> +---+
> scala> sql("select cast(b as string) as b from test").filter("b in 
> (2)").show(false)
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
> -
> -
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is 
> supported only on partition keys of type string
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to