Re: Using Lucene as a Document Comparison Tool

Michael Sokolov Fri, 13 Dec 2019 09:40:45 -0800

Have you tried making a BooleanQuery with a term for every word in the
query document as Optional? You will get a lot of matches,  ranked
according to the similarity.


On Thu, Dec 12, 2019 at 10:47 AM John Brown <brown.j...@temple.edu> wrote:
>
> Hi,
>
>
>
> I have some questions about how to use Lucene for the specific purpose of
> finding document similarities. Lucene seems to have classes that were made
> for this, including: ClassicSimilarity and BM25Similarity. However I’m
> fumbling a bit when it comes to implementing them.
>
>
>
> From what I understand, to use these classes you simply set the similarity
> of your IndexWriter and IndexSearcher, then submit a query. The documents
> returned from your query should be ordered from highest to lowest
> similarity.
>
>
>
> My initial thought was to just use a phrase query to hold the "document" I
> want to find similarities to, but phrase queries are limited in that they
> will only return results that are deemed to fall within a certain slop
> value. Term/Boolean queries are similarly limited in that they allow
> documents to be sorted only if they contain all the terms in the query.
>
>
>
> Ideally, I’d like to submit a query that would essentially be a document
> itself. Each of my queries contain 10 or so phrases, that each contain 5-10
> terms. I would like to compare this query with all the documents in my
> index to see which is the most similar, and which is the least similar. I
> feel as if there is an easy way to do this that I'm missing, after all, I
> essentially just want to remove a step from the process. Any help would be
> much appreciated.
>
>
> Thank  you,
>
> -John B

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using Lucene as a Document Comparison Tool

Reply via email to