Yup avg works good. So we have alternate functions to use in place on the functions pointed out earlier. But my point is that are those original aggregate functions not supposed to be used or I am using them in the wrong way or is it a bug as I asked in my first mail.
On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you tried using avg in place of mean ? > > (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j, > s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } > sqlContext.sql(""" > CREATE TEMPORARY TABLE partitionedParquet > USING org.apache.spark.sql.parquet > OPTIONS ( > path '/tmp/partitioned' > )""") > sqlContext.sql("""select avg(a) from partitionedParquet""").show() > > Cheers > > On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani <sshagunsodh...@gmail.com > > wrote: > >> So I tried @Reynold's suggestion. I could get countDistinct and >> sumDistinct running but mean and approxCountDistinct do not work. (I >> guess I am using the wrong syntax for approxCountDistinct) For mean, I >> think the registry entry is missing. Can someone clarify that as well? >> >> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <sshagunsodh...@gmail.com >> > wrote: >> >>> Will try in a while when I get back. I assume this applies to all >>> functions other than mean. Also countDistinct is defined along with all >>> other SQL functions. So I don't get "distinct is not part of function name" >>> part. >>> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote: >>> >>>> Try >>>> >>>> count(distinct columnane) >>>> >>>> In SQL distinct is not part of the function name. >>>> >>>> On Tuesday, October 27, 2015, Shagun Sodhani <sshagunsodh...@gmail.com> >>>> wrote: >>>> >>>>> Oops seems I made a mistake. The error message is : Exception in >>>>> thread "main" org.apache.spark.sql.AnalysisException: undefined function >>>>> countDistinct >>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi! I was trying out some aggregate functions in SparkSql and I >>>>>> noticed that certain aggregate operators are not working. This includes: >>>>>> >>>>>> approxCountDistinct >>>>>> countDistinct >>>>>> mean >>>>>> sumDistinct >>>>>> >>>>>> For example using countDistinct results in an error saying >>>>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException: >>>>>> undefined function cosh;* >>>>>> >>>>>> I had a similar issue with cosh operator >>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html> >>>>>> as well some time back and it turned out that it was not registered in >>>>>> the >>>>>> registry: >>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >>>>>> >>>>>> >>>>>> *I* *think it is the same issue again and would be glad to send over >>>>>> a PR if someone can confirm if this is an actual bug and not some mistake >>>>>> on my part.* >>>>>> >>>>>> >>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table` >>>>>> Spark Version: 10.4 >>>>>> SparkSql Version: 1.5.1 >>>>>> >>>>>> I am using the standard example of (name, age) schema (though I am >>>>>> setting age as Double and not Int as I am trying out maths functions). >>>>>> >>>>>> The entire error stack can be found here >>>>>> <http://pastebin.com/G6YzQXnn>. >>>>>> >>>>>> Thanks! >>>>>> >>>>> >> >