[
https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598284#comment-16598284
]
Evelyn Bayes edited comment on SPARK-25150 at 8/31/18 6:00 AM:
---------------------------------------------------------------
Sorry my attachment doesn't want to stick, feel free to ask me to email it or
explain to me how it works. Sorry!
was (Author: eeveeb):
Sorry my attachment doesn't want to stick,I'll give it another try.
[^zombie-analysis.py]
> Joining DataFrames derived from the same source yields confusing/incorrect
> results
> ----------------------------------------------------------------------------------
>
> Key: SPARK-25150
> URL: https://issues.apache.org/jira/browse/SPARK-25150
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.1
> Reporter: Nicholas Chammas
> Priority: Major
> Attachments: output-with-implicit-cross-join.txt,
> output-without-implicit-cross-join.txt, persons.csv, states.csv,
> zombie-analysis.py
>
>
> I have two DataFrames, A and B. From B, I have derived two additional
> DataFrames, B1 and B2. When joining A to B1 and B2, I'm getting a very
> confusing error:
> {code:java}
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}
> Then, when I configure "spark.sql.crossJoin.enabled=true" as instructed,
> Spark appears to give me incorrect answers.
> I am not sure if I am missing something obvious, or if there is some kind of
> bug here. The "join condition is missing" error is confusing and doesn't make
> sense to me, and the seemingly incorrect output is concerning.
> I've attached a reproduction, along with the output I'm seeing with and
> without the implicit cross join enabled.
> I realize the join I've written is not correct in the sense that it should be
> left outer join instead of an inner join (since some of the aggregates are
> not available for all states), but that doesn't explain Spark's behavior.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]