[ 
https://issues.apache.org/jira/browse/SPARK-27814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-27814:
----------------------------
    Description: 
For a partitioned table, such as:

{code:sql}
table test (c1 int, c2 string) partitioned by (c3 Int)
{code}

If we use a cast operation in query, which cast the partitioned column, such as 
:


{code:sql}
select * from test where (cast c3 as string)  = '0'
{code}

One predication of this query is cast(c3 as string) = ’0‘.
It would invoke this method to convert to a filter.

{code:java}
     case op @ SpecialBinaryComparison(
          ExtractAttribute(NonVarcharAttribute(name)), 
ExtractableLiteral(value)) =>
        Some(s"$name ${op.symbol} $value")
{code}

First, it invokes the ExtractAttribute.unapply to judge whether c3 can be 
casted to string, the result is yes.
Then it would invoke the origin NonVarcharAttribute, because the hivevar type 
of c3 is not varchar,
this prediction will be converted to c3 = "0", and pushed down.

But, Filtering is supported only on partition keys of type string, so it would 
trigger an exception.

  was:
For a partitioned table, such as:

table test (c1 int, c2 string) partitioned by (c3 Int)
If we use a cast operation in query, which cast the partitioned column, such as 
:

select * from test where (cast c3 as string)  = '0'
One predication of this query is cast(c3 as string) = ’0‘.
It would invoke this method to convert to a filter.

     case op @ SpecialBinaryComparison(
          ExtractAttribute(NonVarcharAttribute(name)), 
ExtractableLiteral(value)) =>
        Some(s"$name ${op.symbol} $value")
First, it invokes the ExtractAttribute.unapply to judge whether c3 can be 
casted to string, the result is yes.
Then it would invoke the origin NonVarcharAttribute, because the hivevar type 
of c3 is not varchar,
this prediction will be converted to c3 = "0", and pushed down.

But, Filtering is supported only on partition keys of type string, so it would 
trigger an exception.


> The cast operation for partitioned column may push down uncorrect filter, 
> which is fatal.
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-27814
>                 URL: https://issues.apache.org/jira/browse/SPARK-27814
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: feiwang
>            Priority: Major
>
> For a partitioned table, such as:
> {code:sql}
> table test (c1 int, c2 string) partitioned by (c3 Int)
> {code}
> If we use a cast operation in query, which cast the partitioned column, such 
> as :
> {code:sql}
> select * from test where (cast c3 as string)  = '0'
> {code}
> One predication of this query is cast(c3 as string) = ’0‘.
> It would invoke this method to convert to a filter.
> {code:java}
>      case op @ SpecialBinaryComparison(
>           ExtractAttribute(NonVarcharAttribute(name)), 
> ExtractableLiteral(value)) =>
>         Some(s"$name ${op.symbol} $value")
> {code}
> First, it invokes the ExtractAttribute.unapply to judge whether c3 can be 
> casted to string, the result is yes.
> Then it would invoke the origin NonVarcharAttribute, because the hivevar type 
> of c3 is not varchar,
> this prediction will be converted to c3 = "0", and pushed down.
> But, Filtering is supported only on partition keys of type string, so it 
> would trigger an exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to