[
https://issues.apache.org/jira/browse/SPARK-17734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654971#comment-15654971
]
Leonardo Yvens commented on SPARK-17734:
----------------------------------------
Hello, [~dongjoon] and [~pdxleif]. The issue wants a method that returns a
typed Dataset but the suggestion in the first comment returns a DataFrame so it
dosen't fix the issue. Maybe what you want is a `joinWith(other : Dataset\[U],
usingColumn: String) : Dataset\[(T, U)]`? I think that method should not exist
because it would have to keep the column duplicated, and that is inconsistent
with the `join` method since it does not duplicate the column. Then this would
be closed as wontfix. Is this correct or am I misunderstanding the issue?
> inner equi-join shorthand that returns Datasets, like DataFrame already has
> ---------------------------------------------------------------------------
>
> Key: SPARK-17734
> URL: https://issues.apache.org/jira/browse/SPARK-17734
> Project: Spark
> Issue Type: Wish
> Reporter: Leif Warner
> Priority: Minor
>
> There's an existing ".join(right: Dataset[_], usingColumn: String):
> DataFrame" method on Dataset.
> Would appreciate it if a variant that returns typed Datasets would also
> available.
> If you write a join that contains the common column name name, you get an
> AnalysisException thrown because that's ambiguous, e.g:
> $"foo" === $"foo"
> So I wrote table1.toDF()("foo") === table2.toDF()("foo"), but that's a little
> error prone, and coworkers considered it a hack and didn't want to use it,
> because it "mixes DataFrame and Dataset api".
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]