[
https://issues.apache.org/jira/browse/SPARK-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin Huai updated SPARK-10740:
-----------------------------
Priority: Blocker (was: Major)
> handle nondeterministic expressions correctly for set operations
> ----------------------------------------------------------------
>
> Key: SPARK-10740
> URL: https://issues.apache.org/jira/browse/SPARK-10740
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Wenchen Fan
> Assignee: Wenchen Fan
> Priority: Blocker
> Fix For: 1.6.0, 1.5.1
>
>
> We should only push down deterministic filter condition to set operator.
> For Union, let's say we do a non-deterministic filter on 1...5 union 1...5,
> and we may get 1,3 for the left side and 2,4 for the right side, then the
> result should be 1,3,2,4. If we push down this filter, we get 1,3 for both
> side(we create a new random object with same seed in each side) and the
> result would be 1,3,1,3.
> For Intersect, let's say there is a non-deterministic condition with a 0.5
> possibility to accept a row and we have a row that presents in both sides of
> an Intersect. Once we push down this condition, the possibility to accept
> this row will be 0.25.
> For Except, let's say there is a row that presents in both sides of an
> Except. This row should not be in the final output. However, if we pushdown a
> non-deterministic condition, it is possible that this row is rejected from
> one side and then we output a row that should not be a part of the result.
> We should only push down deterministic projection to Union.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]