[
https://issues.apache.org/jira/browse/SPARK-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346209#comment-15346209
]
Jiang Xingbo commented on SPARK-14172:
--------------------------------------
In collectProjectsAndFilters function, currently we only collect filters when
condition.deterministic is true. Consider one condition like this:
"SELECT value, hr FROM srcpart1 WHERE ds = '2008-04-08' AND rand() < 0.9"
where srcpart1 has ds as its partition key. In this case, condition is not
determinstic, but we should collect "ds = '2008-04-08'" because it's a
determinstic part and is connected with the other part using 'AND'.
To fix this problem, I suggest we recursively split the condition which is an
'AND', and then we put the derminstic parts into filters, and the other parts
into child.
Were these thoughts plausible, I'm willing to create a pr to improve this.
> Hive table partition predicate not passed down correctly
> --------------------------------------------------------
>
> Key: SPARK-14172
> URL: https://issues.apache.org/jira/browse/SPARK-14172
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1
> Reporter: Yingji Zhang
> Priority: Critical
>
> When the hive sql contains nondeterministic fields, spark plan will not push
> down the partition predicate to the HiveTableScan. For example:
> {code}
> -- consider following query which uses a random function to sample rows
> SELECT *
> FROM table_a
> WHERE partition_col = 'some_value'
> AND rand() < 0.01;
> {code}
> The spark plan will not push down the partition predicate to HiveTableScan
> which ends up scanning all partitions data from the table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]