Dataset Select Function after Aggregate Error

Pedro Rodriguez Fri, 17 Jun 2016 15:33:47 -0700

Hi All,

I am working on using Datasets in 1.6.1 and eventually 2.0 when its
released.


I am running the aggregate code below where I have a dataset where the row
has a field uid:

ds.groupBy(_.uid).count()
// res0: org.apache.spark.sql.Dataset[(String, Long)] = [_1: string, _2:
bigint]

This works as expected, however, attempts to run select statements after
fails:
ds.groupBy(_.uid).count().select(_._1)
// error: missing parameter type for expanded function ((x$2) => x$2._1)
ds.groupBy(_.uid).count().select(_._1)

I have tried several variants, but nothing seems to work. Below is the
equivalent Dataframe code which works as expected:
df.groupBy("uid").count().select("uid")

Thanks!
-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Dataset Select Function after Aggregate Error

Reply via email to