[ 
https://issues.apache.org/jira/browse/SPARK-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350624#comment-15350624
 ] 

Wenchen Fan commented on SPARK-14172:
-------------------------------------

I'm not sure if it's safe to push it down. For non-deterministic expressions, 
the order(or number) of input rows matters. If we push down the deterministic 
part of filter condition, then the input rows to the remaining filter condition 
will change and may result to wrong answer.

> Hive table partition predicate not passed down correctly
> --------------------------------------------------------
>
>                 Key: SPARK-14172
>                 URL: https://issues.apache.org/jira/browse/SPARK-14172
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Yingji Zhang
>            Priority: Critical
>
> When the hive sql contains nondeterministic fields,  spark plan will not push 
> down the partition predicate to the HiveTableScan. For example:
> {code}
> -- consider following query which uses a random function to sample rows
> SELECT *
> FROM table_a
> WHERE partition_col = 'some_value'
> AND rand() < 0.01;
> {code}
> The spark plan will not push down the partition predicate to HiveTableScan 
> which ends up scanning all partitions data from the table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to