On Tue, Jan 29, 2013 at 2:43 PM, George Kelvin
<george.kelvin...@gmail.com> wrote:
> Hi Jack,
>
> The problematic query is "scar"+"wads".
>
> There are several (more than 10) documents in the data with the content
> "star wars", so I think that query should be able to find all these
> documents.
>
> I was trying to provide a minimal test case, but I couldn't reduce the size
> of data showing the failure.
>
> The size of the minimal data showing the failure I got so far is around 2
> million.
>
> However, I found a suspicious document with content "scor". If I remove it
> from the 2 million documents data, that query can find all the "star wars"
> documents. If I add it back, then the query can't find any.

Hmm, maybe try increasing the maxExpansions (one of FuzzyQ's ctors take that).

By default it's 50, meaning we enumerate the top 50 terms within edit
distance 1, so it could be "star" is falling out of the top 50?

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to