[
https://issues.apache.org/jira/browse/FLINK-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627941#comment-14627941
]
ASF GitHub Bot commented on FLINK-2148:
---------------------------------------
Github user gyfora commented on the pull request:
https://github.com/apache/flink/pull/910#issuecomment-121592253
I am wondering about two things, maybe I didnt catch something as I was
just scrolling through:
-Why the output of the distinct methods are not DataStream<Long>
-Why don't we have a distinct method that does not take a field-position,
so counting the true distinct elements in the stream
> Approximately calculate the number of distinct elements of a stream
> -------------------------------------------------------------------
>
> Key: FLINK-2148
> URL: https://issues.apache.org/jira/browse/FLINK-2148
> Project: Flink
> Issue Type: Sub-task
> Components: Streaming
> Reporter: Gabor Gevay
> Assignee: Gabor Gevay
> Priority: Minor
> Labels: statistics
>
> In the paper
> http://people.seas.harvard.edu/~minilek/papers/f0.pdf
> Kane et al. describes an optimal algorithm for estimating the number of
> distinct elements in a data stream.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)