[
https://issues.apache.org/jira/browse/SPARK-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Doi updated SPARK-8152:
----------------------------
Attachment: field_choice_pinpointed.png
Pinpointed the issue to the groupBy line. Grouping by the "itemRecordId" in
one table vs the other is what leads to the difference in behavior. This
should not be the case, since both should be equal.
> Dataframe Join Ignores Condition
> --------------------------------
>
> Key: SPARK-8152
> URL: https://issues.apache.org/jira/browse/SPARK-8152
> Project: Spark
> Issue Type: Bug
> Reporter: Eric Doi
> Attachments: field_choice_pinpointed.png, side-by-side.png
>
>
> When joining two tables A and B, on condition that A.X = B.X, in some cases
> that condition is not fulfilled in the result.
> Suspect it might be due to duplicate column names in the source tables
> causing confusion. Is it possible for there to exist hidden fields in a
> dataframe?
> Will attach a screenshot for more details. The bug is reproducible but hard
> to pinpoint.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]