[jira] [Updated] (SPARK-33098) Explicit or implicit casts involving partition columns can sometimes result in a MetaException.

Bruce Robbins (Jira) Thu, 29 Oct 2020 18:34:24 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bruce Robbins updated SPARK-33098:
----------------------------------
    Description: 
The following cases throw
{{MetaException(message:Filtering is supported only on partition keys of type 
string)}}
{noformat}
sql("create table test (a int) partitioned by (b int) stored as parquet")
sql("insert into test values (1, 1), (1, 2), (2, 2)")

// These throw MetaExceptions
sql("select * from test where b in ('2')").show(false)
sql("select * from test where cast(b as string) = '2'").show(false)
sql("select * from test where cast(b as string) in ('2')").show(false)
sql("select * from test where cast(b as string) in (2)").show(false)
sql("select cast(b as string) as b from test where b in ('2')").show(false)
sql("select cast(b as string) as b from test").filter("b = '2'").show(false) // 
[1]
sql("select cast(b as string) as b from test").filter("b in (2)").show(false) 
// [2]
sql("select cast(b as string) as b from test").filter("b in ('2')").show(false)
sql("select * from test where cast(b as string) > '1'").show(false)
sql("select cast(b as string) b from test").filter("b > '1'").show(false) // [3]

// [1] but not sql("select cast(b as string) as b from test where b = 
'2'").show(false)
// [2] but not sql("select cast(b as string) as b from test where b in 
(2)").show(false)
// [3] but not sql("select cast(b as string) b from test where b > 
'1'").show(false)
{noformat}
The message ("Filtering is supported only on partition keys of type string") is 
misleading. Filter *is* supported on integer columns, for example.

  was:
Comparing a partition column against a literal with the wrong type works if you 
use equality ('='). However, if you use 'in', you get:
{noformat}
MetaException(message:Filtering is supported only on partition keys of type 
string)
{noformat}
For example:
{noformat}
spark-sql> create table test (a int) partitioned by (b int) stored as parquet;
Time taken: 0.323 seconds
spark-sql> insert into test values (1, 1), (1, 2), (2, 2);
20/10/08 19:57:14 WARN log: Updating partition stats fast for: test
20/10/08 19:57:14 WARN log: Updating partition stats fast for: test
20/10/08 19:57:14 WARN log: Updated size to 418
20/10/08 19:57:14 WARN log: Updated size to 836
Time taken: 2.124 seconds

spark-sql> -- this works, of course
spark-sql> select * from test where b in (2);
1       2
2       2
Time taken: 0.13 seconds, Fetched 2 row(s)

spark-sql> -- this also works (equals with wrong type)
spark-sql> select * from test where b = '2';
1       2
2       2
Time taken: 0.132 seconds, Fetched 2 row(s)

spark-sql> -- this does not work ('in' with wrong type)
spark-sql> select * from test where b in ('2');
20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b 
in ('2')]
java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive. You can set the Spark configuration 
setting spark.sql.hive.manageFilesourcePartitions to false to work around this 
problem, however this will result in degraded performance. Please report a bug: 
https://issues.apache.org/jira/browse/SPARK
        at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
-
-
-
Caused by: MetaException(message:Filtering is supported only on partition keys 
of type string)
{noformat}
There are also interesting variations of this using the dataframe API:
{noformat}
scala> sql("select cast(b as string) as b from test where b in (2)").show(false)
+---+
|b  |
+---+
|2  |
|2  |
+---+


scala> sql("select cast(b as string) as b from test").filter("b in 
(2)").show(false)
java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive. You can set the Spark configuration 
setting spark.sql.hive.manageFilesourcePartitions to false to work around this 
problem, however this will result in degraded performance. Please report a bug: 
https://issues.apache.org/jira/browse/SPARK
  at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828)
-
-
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is 
supported only on partition keys of type string
{noformat}
Also this:
{noformat}
scala> sql("select cast(b as string) as b from test").filter("b in 
('2')").show(false)
java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive. You can set the Spark configuration 
setting spark.sql.hive.manageFilesourcePartitions to false to work around this 
problem, however this will result in degraded performance. Please report a bug: 
https://issues.apache.org/jira/browse/SPARK
-
-
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is 
supported only on partition keys of type string
{noformat}



> Explicit or implicit casts involving partition columns can sometimes result 
> in a MetaException.
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-33098
>                 URL: https://issues.apache.org/jira/browse/SPARK-33098
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Bruce Robbins
>            Priority: Major
>
> The following cases throw
> {{MetaException(message:Filtering is supported only on partition keys of type 
> string)}}
> {noformat}
> sql("create table test (a int) partitioned by (b int) stored as parquet")
> sql("insert into test values (1, 1), (1, 2), (2, 2)")
> // These throw MetaExceptions
> sql("select * from test where b in ('2')").show(false)
> sql("select * from test where cast(b as string) = '2'").show(false)
> sql("select * from test where cast(b as string) in ('2')").show(false)
> sql("select * from test where cast(b as string) in (2)").show(false)
> sql("select cast(b as string) as b from test where b in ('2')").show(false)
> sql("select cast(b as string) as b from test").filter("b = '2'").show(false) 
> // [1]
> sql("select cast(b as string) as b from test").filter("b in (2)").show(false) 
> // [2]
> sql("select cast(b as string) as b from test").filter("b in 
> ('2')").show(false)
> sql("select * from test where cast(b as string) > '1'").show(false)
> sql("select cast(b as string) b from test").filter("b > '1'").show(false) // 
> [3]
> // [1] but not sql("select cast(b as string) as b from test where b = 
> '2'").show(false)
> // [2] but not sql("select cast(b as string) as b from test where b in 
> (2)").show(false)
> // [3] but not sql("select cast(b as string) b from test where b > 
> '1'").show(false)
> {noformat}
> The message ("Filtering is supported only on partition keys of type string") 
> is misleading. Filter *is* supported on integer columns, for example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-33098) Explicit or implicit casts involving partition columns can sometimes result in a MetaException.

Reply via email to