Since there is already Average, the simplest change is the following: $ git diff sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Functi index 3dce6c1..920f95b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -184,6 +184,7 @@ object FunctionRegistry { expression[Last]("last"), expression[Last]("last_value"), expression[Max]("max"), + expression[Average]("mean"), expression[Min]("min"), expression[Stddev]("stddev"), expression[StddevPop]("stddev_pop"),
FYI On Wed, Oct 28, 2015 at 2:07 AM, Shagun Sodhani <sshagunsodh...@gmail.com> wrote: > I tried adding the aggregate functions in the registry and they work, > other than mean, for which Ted has forwarded some code changes. I will try > out those changes and update the status here. > > On Wed, Oct 28, 2015 at 9:03 AM, Shagun Sodhani <sshagunsodh...@gmail.com> > wrote: > >> Yup avg works good. So we have alternate functions to use in place on the >> functions pointed out earlier. But my point is that are those original >> aggregate functions not supposed to be used or I am using them in the wrong >> way or is it a bug as I asked in my first mail. >> >> On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Have you tried using avg in place of mean ? >>> >>> (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j, >>> s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } >>> sqlContext.sql(""" >>> CREATE TEMPORARY TABLE partitionedParquet >>> USING org.apache.spark.sql.parquet >>> OPTIONS ( >>> path '/tmp/partitioned' >>> )""") >>> sqlContext.sql("""select avg(a) from partitionedParquet""").show() >>> >>> Cheers >>> >>> On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani < >>> sshagunsodh...@gmail.com> wrote: >>> >>>> So I tried @Reynold's suggestion. I could get countDistinct and >>>> sumDistinct running but mean and approxCountDistinct do not work. (I >>>> guess I am using the wrong syntax for approxCountDistinct) For mean, I >>>> think the registry entry is missing. Can someone clarify that as well? >>>> >>>> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani < >>>> sshagunsodh...@gmail.com> wrote: >>>> >>>>> Will try in a while when I get back. I assume this applies to all >>>>> functions other than mean. Also countDistinct is defined along with all >>>>> other SQL functions. So I don't get "distinct is not part of function >>>>> name" >>>>> part. >>>>> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote: >>>>> >>>>>> Try >>>>>> >>>>>> count(distinct columnane) >>>>>> >>>>>> In SQL distinct is not part of the function name. >>>>>> >>>>>> On Tuesday, October 27, 2015, Shagun Sodhani < >>>>>> sshagunsodh...@gmail.com> wrote: >>>>>> >>>>>>> Oops seems I made a mistake. The error message is : Exception in >>>>>>> thread "main" org.apache.spark.sql.AnalysisException: undefined function >>>>>>> countDistinct >>>>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi! I was trying out some aggregate functions in SparkSql and I >>>>>>>> noticed that certain aggregate operators are not working. This >>>>>>>> includes: >>>>>>>> >>>>>>>> approxCountDistinct >>>>>>>> countDistinct >>>>>>>> mean >>>>>>>> sumDistinct >>>>>>>> >>>>>>>> For example using countDistinct results in an error saying >>>>>>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException: >>>>>>>> undefined function cosh;* >>>>>>>> >>>>>>>> I had a similar issue with cosh operator >>>>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html> >>>>>>>> as well some time back and it turned out that it was not registered in >>>>>>>> the >>>>>>>> registry: >>>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >>>>>>>> >>>>>>>> >>>>>>>> *I* *think it is the same issue again and would be glad to send >>>>>>>> over a PR if someone can confirm if this is an actual bug and not some >>>>>>>> mistake on my part.* >>>>>>>> >>>>>>>> >>>>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table` >>>>>>>> Spark Version: 10.4 >>>>>>>> SparkSql Version: 1.5.1 >>>>>>>> >>>>>>>> I am using the standard example of (name, age) schema (though I am >>>>>>>> setting age as Double and not Int as I am trying out maths functions). >>>>>>>> >>>>>>>> The entire error stack can be found here >>>>>>>> <http://pastebin.com/G6YzQXnn>. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>> >>>> >>> >> >