[
https://issues.apache.org/jira/browse/FLINK-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700603#comment-16700603
]
Stephan Ewen commented on FLINK-10993:
--------------------------------------
I would suggest to focus on what the behavior should be in the DataStream API
for unbounded streams, which is quite different from what you give as an
example in the beginning.
DataSet is most likely a simper special case .
> Bring bloomfilter as a public API
> ---------------------------------
>
> Key: FLINK-10993
> URL: https://issues.apache.org/jira/browse/FLINK-10993
> Project: Flink
> Issue Type: New Feature
> Components: DataStream API
> Reporter: vinoyang
> Assignee: vinoyang
> Priority: Major
>
> Flink internally provides an implementation of BloomFilter, but only for
> internal optimization, and does not provide APIs for public access.
> Here is a user mail discussion before :
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Bloom-filter-in-Flink-td10608.html
> Considering that many users have the need to "determine duplicates" in
> streaming computing, I think it would make sense to provide such an API.
> In addition, Spark has provided BloomFilter as a public API :
> {code:java}
> val bf = df.stat.bloomFilter("dd",dataLen,0.01)
> val rightNum = rdd.map(x=>(x.toInt,bf.mightContainString(x)))
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)