Will try in a while when I get back. I assume this applies to all functions other than mean. Also countDistinct is defined along with all other SQL functions. So I don't get "distinct is not part of function name" part. On 27 Oct 2015 19:58, "Reynold Xin" <[email protected]> wrote:
> Try > > count(distinct columnane) > > In SQL distinct is not part of the function name. > > On Tuesday, October 27, 2015, Shagun Sodhani <[email protected]> > wrote: > >> Oops seems I made a mistake. The error message is : Exception in thread >> "main" org.apache.spark.sql.AnalysisException: undefined function >> countDistinct >> On 27 Oct 2015 15:49, "Shagun Sodhani" <[email protected]> wrote: >> >>> Hi! I was trying out some aggregate functions in SparkSql and I noticed >>> that certain aggregate operators are not working. This includes: >>> >>> approxCountDistinct >>> countDistinct >>> mean >>> sumDistinct >>> >>> For example using countDistinct results in an error saying >>> *Exception in thread "main" org.apache.spark.sql.AnalysisException: >>> undefined function cosh;* >>> >>> I had a similar issue with cosh operator >>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html> >>> as well some time back and it turned out that it was not registered in the >>> registry: >>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >>> >>> >>> *I* *think it is the same issue again and would be glad to send over a >>> PR if someone can confirm if this is an actual bug and not some mistake on >>> my part.* >>> >>> >>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table` >>> Spark Version: 10.4 >>> SparkSql Version: 1.5.1 >>> >>> I am using the standard example of (name, age) schema (though I am >>> setting age as Double and not Int as I am trying out maths functions). >>> >>> The entire error stack can be found here <http://pastebin.com/G6YzQXnn>. >>> >>> Thanks! >>> >>
