So I tried @Reynold's suggestion. I could get countDistinct and sumDistinct running but mean and approxCountDistinct do not work. (I guess I am using the wrong syntax for approxCountDistinct) For mean, I think the registry entry is missing. Can someone clarify that as well?
On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <[email protected]> wrote: > Will try in a while when I get back. I assume this applies to all > functions other than mean. Also countDistinct is defined along with all > other SQL functions. So I don't get "distinct is not part of function name" > part. > On 27 Oct 2015 19:58, "Reynold Xin" <[email protected]> wrote: > >> Try >> >> count(distinct columnane) >> >> In SQL distinct is not part of the function name. >> >> On Tuesday, October 27, 2015, Shagun Sodhani <[email protected]> >> wrote: >> >>> Oops seems I made a mistake. The error message is : Exception in thread >>> "main" org.apache.spark.sql.AnalysisException: undefined function >>> countDistinct >>> On 27 Oct 2015 15:49, "Shagun Sodhani" <[email protected]> wrote: >>> >>>> Hi! I was trying out some aggregate functions in SparkSql and I >>>> noticed that certain aggregate operators are not working. This includes: >>>> >>>> approxCountDistinct >>>> countDistinct >>>> mean >>>> sumDistinct >>>> >>>> For example using countDistinct results in an error saying >>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException: >>>> undefined function cosh;* >>>> >>>> I had a similar issue with cosh operator >>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html> >>>> as well some time back and it turned out that it was not registered in the >>>> registry: >>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >>>> >>>> >>>> *I* *think it is the same issue again and would be glad to send over a >>>> PR if someone can confirm if this is an actual bug and not some mistake on >>>> my part.* >>>> >>>> >>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table` >>>> Spark Version: 10.4 >>>> SparkSql Version: 1.5.1 >>>> >>>> I am using the standard example of (name, age) schema (though I am >>>> setting age as Double and not Int as I am trying out maths functions). >>>> >>>> The entire error stack can be found here <http://pastebin.com/G6YzQXnn> >>>> . >>>> >>>> Thanks! >>>> >>>
