Stephen Link created SPARK-10933:
------------------------------------
Summary: Spark SQL Joins should have option to fail query when row
multiplication is encountered
Key: SPARK-10933
URL: https://issues.apache.org/jira/browse/SPARK-10933
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Stephen Link
Priority: Minor
When constructing spark sql queries, we commonly run into scenarios where users
have inadvertently caused a cartesian product/row expansion. It is sometimes
possible to detect this in advance with separate queries, but it would be far
more ideal if it was possible to have a setting that disallowed join keys
showing up multiple times on both sides of a join operation.
This setting would belong in SQLConf. The functionality could likely be
implemented by forcing a sorted shuffle, then checking for duplication on the
streamed results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]