[ 
https://issues.apache.org/jira/browse/FLINK-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628246#comment-14628246
 ] 

ASF GitHub Bot commented on FLINK-2148:
---------------------------------------

Github user ggevay commented on the pull request:

    https://github.com/apache/flink/pull/910#issuecomment-121654093
  
    > -Why the output of the distinct methods are not DataStream
    
    They do output DataStream. (<T> is their generic parameter.)
    
    > -Why don't we have a distinct method that does not take a field-position, 
so counting the true distinct elements in the stream
    
    That's a good idea, I have added it now.


> Approximately calculate the number of distinct elements of a stream
> -------------------------------------------------------------------
>
>                 Key: FLINK-2148
>                 URL: https://issues.apache.org/jira/browse/FLINK-2148
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Streaming
>            Reporter: Gabor Gevay
>            Assignee: Gabor Gevay
>            Priority: Minor
>              Labels: statistics
>
> In the paper
> http://people.seas.harvard.edu/~minilek/papers/f0.pdf
> Kane et al. describes an optimal algorithm for estimating the number of 
> distinct elements in a data stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to