Brandon Dahler created SPARK-35739:
--------------------------------------
Summary: [Spark Sql] Add Java-comptable Dataset.join overloads
Key: SPARK-35739
URL: https://issues.apache.org/jira/browse/SPARK-35739
Project: Spark
Issue Type: Improvement
Components: Java API, SQL
Affects Versions: 3.0.0, 2.0.0
Reporter: Brandon Dahler
h2. Problem
When using Spark SQL with Java, the required syntax to utilize the following
two overloads are unnatural and not obvious to developers that haven't had to
interoperate with Scala before:
{code:java}
def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
def join(right: Dataset[_], usingColumns: Seq[String], joinType: String):
DataFrame{code}
Examples:
Java 11
{code:java}
Dataset<Row> dataset1 = ...;
Dataset<Row> dataset2 = ...;
// Overload with multiple usingColumns, no join type
dataset1
.join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
.show();
// Overload with multiple usingColumns and a join type
dataset1
.join(
dataset2,
JavaConverters.asScalaBuffer(List.of("column", "column2")),
"left")
.show();
{code}
Additionally there is no overload that takes a single usingColumnn and a
joinType, forcing the developer to use the Seq[String] overload regardless of
language.
Examples:
Scala
{code:java}
val dataset1 :DataFrame = ...;
val dataset2 :DataFrame = ...;
dataset1
.join(dataset2, Seq("column"))
.show();
{code}
Java 11
{code:java}
Dataset<Row> dataset1 = ...;
Dataset<Row> dataset2 = ...;
dataset1
.join(dataset2, JavaConverters.asScalaBuffer(List.of("column")))
.show();
{code}
h2. Proposed Improvement
Add 3 additional overloads to Dataset:
{code:java}
def join(right: Dataset[_], usingColumn: List[String]): DataFrame
def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
def join(right: Dataset[_], usingColumn: List[String], joinType: String):
DataFrame{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]