I'm sorry, but for anybody to help you here, you really need to be able to
provide a concise test case, like 10-20 lines of code, completely
self-contained. If you think you need a million documents to repro what you
claimed was a simple scenario, then you leave me very, very confused - and
unable to help you any further.
-- Jack Krupansky
-----Original Message-----
From: George Kelvin
Sent: Tuesday, January 29, 2013 2:43 PM
To: java-user@lucene.apache.org
Subject: Re: Questions about FuzzyQuery in Lucene 4.x
Hi Jack,
The problematic query is "scar"+"wads".
There are several (more than 10) documents in the data with the content
"star wars", so I think that query should be able to find all these
documents.
I was trying to provide a minimal test case, but I couldn't reduce the size
of data showing the failure.
The size of the minimal data showing the failure I got so far is around 2
million.
However, I found a suspicious document with content "scor". If I remove it
from the 2 million documents data, that query can find all the "star wars"
documents. If I add it back, then the query can't find any.
I tried to reduce the size of the data to 1 million further and add that
"scor" document, but now the query can still find all the "star wars"
documents.
Is it possible that Lucene somehow fail to find all the valid terms within
the edit distance?
Thanks!
George
On Tue, Jan 29, 2013 at 10:02 AM, Jack Krupansky
<j...@basetechnology.com>wrote:
I also noticed that you have "MUST" for your full string of fuzzy terms -
that means everyone of them must appear in an indexed document to be
matched. Is it possible that maybe even one term was not in the same
indexed document?
Try to provide a complete example that shows the input data and the query
- all the literals. In other words, construct a minimal test case that
shows the failure.
-- Jack Krupansky
-----Original Message----- From: George Kelvin
Sent: Tuesday, January 29, 2013 12:28 PM
To: java-user@lucene.apache.org
Subject: Re: Questions about FuzzyQuery in Lucene 4.x
Hi Jack,
ed is set to 1 here and I have lowercased all the data and queries.
Regarding the indexed data factor you mentioned, can you elaborate more?
Thanks!
George
On Tue, Jan 29, 2013 at 9:10 AM, Jack Krupansky <j...@basetechnology.com>*
*wrote:
That depends on the value of "ed", and the indexed data.
Another factor to take into consideration is that a case change ("Star"
vs. "star") also counts as an edit.
-- Jack Krupansky
-----Original Message----- From: George Kelvin
Sent: Tuesday, January 29, 2013 11:49 AM
To: java-user@lucene.apache.org
Subject: Re: Questions about FuzzyQuery in Lucene 4.x
Hi Jack,
Thanks for your reply!
I don't think I passed the prefixLength parameter in.
Here is the code I used to build the FuzzyQuery:
String[] words = str.split("\\+");
BooleanQuery query = new BooleanQuery();
for (int i=0; i<words.length; i++)
{
Term t = new Term(field, words[i]);
FuzzyQuery fq = new FuzzyQuery(t, ed);
query.add(fq, BooleanClause.Occur.MUST);
}
int k = 10;
TopDocs results = searcher.search(query, k);
Does it look right to you?
Thanks!
George
------------------------------****----------------------------**
--**---------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org<
java-user-**unsubscr...@lucene.apache.org<java-user-unsubscr...@lucene.apache.org>
>
For additional commands, e-mail: java-user-help@lucene.apache.****org<
java-user-help@lucene.**apache.org <java-user-h...@lucene.apache.org>>
------------------------------**------------------------------**---------
To unsubscribe, e-mail:
java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org>
For additional commands, e-mail:
java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org