Github user jparkie commented on a diff in the pull request:
https://github.com/apache/flink/pull/4652#discussion_r140823213
--- Diff: docs/dev/table/sql.md ---
@@ -2020,7 +2020,16 @@ COUNT(*)
<p>Returns the number of input rows.</p>
</td>
</tr>
-
+<tr>
+ <td>
+ {% highlight text %}
+CARDINALITY_COUNT(rsd, value)
--- End diff --
Would it be clearer to the user to have the function have the word
"approximate" in it such that the user understands the count is an estimate? I
see Apache Spark calls it
`approx_count_distinct`(https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/sql/functions.html#approx_count_distinct-org.apache.spark.sql.Column-double-)
and Redshift has it as `APPROXIMATE COUNT(DISTINCT column)`
(http://docs.aws.amazon.com/redshift/latest/dg/r_COUNT.html).
---