Stanislav Bytsko created SPARK-27582:
----------------------------------------
Summary: Add Dataset DSL for left_anti and left_semi joins
Key: SPARK-27582
URL: https://issues.apache.org/jira/browse/SPARK-27582
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.4.2
Reporter: Stanislav Bytsko
Currently we have
{code:java}
org.apache.spark.sql.Dataset[T]#joinWith[U](other: Dataset[U], condition:
Column, joinType: String): Dataset[(T, U)]
{code}
which explicitly excludes left_anti and left_semi joins, which is
understandable, because result would have different signature.
I think it's easily fixed drawback, which accepts 2 solutions I can think of:
- Extend current joinWith to return null for second (_2) item in the tuple. Not
ideal as no-one likes nulls, but workable, as client should be able to handle
that by doing {code}.map(_._1){code} immediately afterwards
- Add 2 new methods
{code}org.apache.spark.sql.Dataset[T]#joinSemiWith[U](other: Dataset[U],
condition: Column): Dataset[T]{code} and
{code}org.apache.spark.sql.Dataset[T]#joinAntiWith[U](other: Dataset[U],
condition: Column): Dataset[T]{code} which is much nicer, but adds 2 methods to
the API.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]