[
https://issues.apache.org/jira/browse/SPARK-27582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-27582.
----------------------------------
Resolution: Won't Fix
I don't think we should add a set of aliases. The way of them looks already
super easy.
> Add Dataset DSL for left_anti and left_semi joins
> -------------------------------------------------
>
> Key: SPARK-27582
> URL: https://issues.apache.org/jira/browse/SPARK-27582
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.2
> Reporter: Stanislav Bytsko
> Priority: Major
>
> Currently we have
> {code:java}
> org.apache.spark.sql.Dataset[T]#joinWith[U](other: Dataset[U], condition:
> Column, joinType: String): Dataset[(T, U)]
> {code}
> which explicitly excludes left_anti and left_semi joins, which is
> understandable, because result would have different signature.
> I think it's easily fixed drawback, which accepts 2 solutions I can think of:
> - Extend current joinWith to return null for second (_2) item in the tuple.
> Not ideal as no-one likes nulls, but workable, as client should be able to
> handle that by doing {code}.map(_._1){code} immediately afterwards
> - Add 2 new methods
> {code}org.apache.spark.sql.Dataset[T]#joinSemiWith[U](other: Dataset[U],
> condition: Column): Dataset[T]{code} and
> {code}org.apache.spark.sql.Dataset[T]#joinAntiWith[U](other: Dataset[U],
> condition: Column): Dataset[T]{code} which is much nicer, but adds 2 methods
> to the API. Method names could be semiJoinWith and antiJoinWith, which is
> more logical, but not sorted properly in the list of
> org.apache.spark.sql.Dataset methods
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]