Re: Problems with exact matces on non-tokenized fields...

Stefanos Karasavvidis Wed, 13 Nov 2002 11:46:17 -0800

I came accross the same problem and I think that the faq entry you (Otis) propose should get a better title so that users can find more easily an answer to this problem.

Correct me if I'm wrong (and please forgive any wrong assumptions I may have made), put the problem is on "how to query on a non tokenized field?"

Problem explanation:
If a field is not tokenized than it is not passed through the analyzer, independently of the used analyzer (that's what I understand by looking into DocumentWriter.invertDocument()).
If you construct a query with a given analyzer (for example with QueryParser.parse(query, field, analyzer)) with this field, the queryparser does not know that this field is not tokenized and passes it through the analyzer. Ther analyzer may alter the query (for example if the analyzer has a stemming algorithm) and the document is not matched uppon the query.

The solution:
The solution is to make sure that fields that aren't tokenized during indexig, are not passed through the analyzer during searching. This can be done in 2 ways, either by making an analyzer that takes care of this according to the field, or by constructing a TermQuery with this field and adding it to the rest of the query

Example:
put here the 2 examples from Doug

Stefanos

Otis Gospodnetic wrote:

Thanks, it's a FAQ entry now:

How do I write my own Analyzer?
http://www.jguru.com/faq/view.jsp?EID=1006122

Otis

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

karl �ie wrote:

I have a Lucene Document with a field named "element" which is

stored

and indexed but not tokenized. The value of the field is "POST" (uppercase). But the only way i can match the field is by entering "element:POST?" or "element:POST*" in the QueryParser class.

There are two ways to do this.

If this must be entered by users in the query string, then you need
to use a non-lowercasing analyzer for this field. The way to do this if

you're currently using StandardAnalyzer, is to do something like:

public class MyAnalyzer extends Analyzer {
private Analyzer standard = new StandardAnalyzer();
public TokenStream tokenStream(String field, final Reader
reader) {
if ("element".equals(field)) { // don't tokenize
return new CharTokenizer(reader) {
protected boolean isTokenChar(char c) { return true; }
};
} else { // use standard analyzer
return standard.tokenStream(field, reader);
}
}
}

Analyzer analyzer = new MyAnalyzer();
Query query = queryParser.parse("... +element:POST", analyzer);

Alternately, if this query field is added by a program, then this can
be done by bypassing the analyzer for this class, building this clause directly instead:

Analyzer analyzer = new StandardAnalyzer();
BooleanQuery query = (BooleanQuery)queryParser.parse("...",
analyzer);

// now add the element clause
query.add(new TermQuery(new Term("element", "POST"))), true,
false);

Perhaps this should become an FAQ...

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@;jakarta.apache.org>

__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>

Re: Problems with exact matces on non-tokenized fields...

Reply via email to