Dongjoon Hyun created SPARK-15807:
-------------------------------------
Summary: Support varargs for distinct/dropDuplicates in
Dataset/DataFrame
Key: SPARK-15807
URL: https://issues.apache.org/jira/browse/SPARK-15807
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Dongjoon Hyun
This issue adds `varargs`-types `distinct/dropDuplicates` functions in
`Dataset/DataFrame`. Currently, `distinct` does not get arguments, and
`dropDuplicates` supports only `Seq` or `Array`.
{code}
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]
scala> ds.dropDuplicates(Seq("_1", "_2"))
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2:
int]
scala> ds.dropDuplicates("_1", "_2")
<console>:26: error: overloaded method value dropDuplicates with alternatives:
(colNames:
Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and>
(colNames: Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
<and>
()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
cannot be applied to (String, String)
ds.dropDuplicates("_1", "_2")
^
scala> ds.distinct("_1", "_2")
<console>:26: error: too many arguments for method distinct:
()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
ds.distinct("_1", "_2")
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]