Github user jparkie commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4652#discussion_r140823213
  
    --- Diff: docs/dev/table/sql.md ---
    @@ -2020,7 +2020,16 @@ COUNT(*)
             <p>Returns the number of input rows.</p>
           </td>
         </tr>
    -
    +<tr>
    +      <td>
    +        {% highlight text %}
    +CARDINALITY_COUNT(rsd, value)
    --- End diff --
    
    Would it be clearer to the user to have the function have the word 
"approximate" in it such that the user understands the count is an estimate? I 
see Apache Spark calls it 
`approx_count_distinct`(https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/sql/functions.html#approx_count_distinct-org.apache.spark.sql.Column-double-)
 and Redshift has it as `APPROXIMATE COUNT(DISTINCT column)` 
(http://docs.aws.amazon.com/redshift/latest/dg/r_COUNT.html).


---

Reply via email to