[ 
https://issues.apache.org/jira/browse/LUCENE-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255434#comment-15255434
 ] 

Cao Manh Dat edited comment on LUCENE-6968 at 4/25/16 4:52 AM:
---------------------------------------------------------------

What's a wonderful patch. The code is optimized, sure that the the index will 
be much smaller!

But the patch keep some lowest values for each position, did it affect the 
formula 
{code} Pr(h(s1) = h(s2)) = Jaccard(s1,s2) {code}


was (Author: caomanhdat):
What's a wonderful patch. The code is optimized, sure that the the index will 
be much smaller!

But the patch keep some lowest values for each position, so for given 
expectedTruePositive how can we compute the band size?

> LSH Filter
> ----------
>
>                 Key: LUCENE-6968
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6968
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Cao Manh Dat
>         Attachments: LUCENE-6968.patch, LUCENE-6968.patch, LUCENE-6968.patch
>
>
> I'm planning to implement LSH. Which support query like this
> {quote}
> Find similar documents that have 0.8 or higher similar score with a given 
> document. Similarity measurement can be cosine, jaccard, euclid..
> {quote}
> For example. Given following corpus
> {quote}
> 1. Solr is an open source search engine based on Lucene
> 2. Solr is an open source enterprise search engine based on Lucene
> 3. Solr is an popular open source enterprise search engine based on Lucene
> 4. Apache Lucene is a high-performance, full-featured text search engine 
> library written entirely in Java
> {quote}
> We wanna find documents that have 0.6 score in jaccard measurement with this 
> doc
> {quote}
> Solr is an open source search engine
> {quote}
> It will return only docs 1,2 and 3 (MoreLikeThis will also return doc 4)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to