Github user jparkie commented on a diff in the pull request: https://github.com/apache/flink/pull/4652#discussion_r140823213 --- Diff: docs/dev/table/sql.md --- @@ -2020,7 +2020,16 @@ COUNT(*) <p>Returns the number of input rows.</p> </td> </tr> - +<tr> + <td> + {% highlight text %} +CARDINALITY_COUNT(rsd, value) --- End diff -- Would it be clearer to the user to have the function have the word "approximate" in it such that the user understands the count is an estimate? I see Apache Spark calls it `approx_count_distinct`(https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/sql/functions.html#approx_count_distinct-org.apache.spark.sql.Column-double-) and Redshift has it as `APPROXIMATE COUNT(DISTINCT column)` (http://docs.aws.amazon.com/redshift/latest/dg/r_COUNT.html).
---