Stephen Link created SPARK-10933:
------------------------------------

             Summary: Spark SQL Joins should have option to fail query when row 
multiplication is encountered
                 Key: SPARK-10933
                 URL: https://issues.apache.org/jira/browse/SPARK-10933
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Stephen Link
            Priority: Minor


When constructing spark sql queries, we commonly run into scenarios where users 
have inadvertently caused a cartesian product/row expansion. It is sometimes 
possible to detect this in advance with separate queries, but it would be far 
more ideal if it was possible to have a setting that disallowed join keys 
showing up multiple times on both sides of a join operation.

This setting would belong in SQLConf. The functionality could likely be 
implemented by forcing a sorted shuffle, then checking for duplication on the 
streamed results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to