> their name?
yes it does, but look at the question's title
"How do I write my own Analyzer?"
if someone has a problem with a non-tokenized field (which was the problem of the mail thread that started this) then he doesn't know that he has to write a custom analyzer, and so he won't be able to find the correct faq entry.
Moreover, the second solution Doug has proposed suites better in some cases and should be included, too. (Doug has written these solutions in a mail to the users list on 27/9/2002 9:24 p.m.)
I still think that there should be a faq entry as I propose in my previous email.
Moreover, there should be an addition to the faq entry
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q15
it states there that it is important to use the same analyzer during indexing and searching. Again this may lead to problems if a field is not tokenized (during indexing it will _not_ get passed through the analyzer, but during searching it get's passed. If the analyzer does not treat that field as a special case, there will be a problem.)
I don't know, maybe I'm missing something here, but it seems obvious to me that non tokenized fields in conjuction with analyzers produce problems which should be mentioned in documenation/faq etc.
Stefanos
Otis Gospodnetic wrote:
Not sure which FAQ entry you are refering to.
This one http://www.jguru.com/faq/view.jsp?EID=1006122 ?
Doesn't that one do just that - treats fields differently, based on
their name?
Otis
--- Stefanos Karasavvidis <[EMAIL PROTECTED]> wrote:
I came accross the same problem and I think that the faq entry you (Otis) propose should get a better title so that users can find more easily an answer to this problem.
Correct me if I'm wrong (and please forgive any wrong assumptions I
may have made), put the problem is on "how to query on a non tokenized
field?"
Problem explanation:
If a field is not tokenized than it is not passed through the
analyzer, independently of the used analyzer (that's what I understand by
looking into DocumentWriter.invertDocument()).
If you construct a query with a given analyzer (for example with QueryParser.parse(query, field, analyzer)) with this field, the queryparser does not know that this field is not tokenized and passes
it through the analyzer. Ther analyzer may alter the query (for example
if the analyzer has a stemming algorithm) and the document is not
matched uppon the query.
The solution:
The solution is to make sure that fields that aren't tokenized during
indexig, are not passed through the analyzer during searching. This
can be done in 2 ways, either by making an analyzer that takes care of
this according to the field, or by constructing a TermQuery with this
field and adding it to the rest of the query
Example:
put here the 2 examples from Doug
Stefanos
Otis Gospodnetic wrote:
Thanks, it's a FAQ entry now:if
How do I write my own Analyzer?
http://www.jguru.com/faq/view.jsp?EID=1006122
Otis
--- Doug Cutting <[EMAIL PROTECTED]> wrote:
karl �ie wrote:
I have a Lucene Document with a field named "element" which isstored
and indexed but not tokenized. The value of the field is "POST" (uppercase). But the only way i can match the field is by enteringThere are two ways to do this.
"element:POST?" or "element:POST*" in the QueryParser class.
If this must be entered by users in the query string, then you need
to use a non-lowercasing analyzer for this field. The way to do this
analyzeryou're currently using StandardAnalyzer, is to do something like:
public class MyAnalyzer extends Analyzer {
private Analyzer standard = new StandardAnalyzer();
public TokenStream tokenStream(String field, final Reader
reader) {
if ("element".equals(field)) { // don't tokenize
return new CharTokenizer(reader) {
protected boolean isTokenChar(char c) { return true; }
};
} else { // use standard
canreturn standard.tokenStream(field, reader);
}
}
}
Analyzer analyzer = new MyAnalyzer();
Query query = queryParser.parse("... +element:POST", analyzer);
Alternately, if this query field is added by a program, then this
<mailto:lucene-user-unsubscribe@;jakarta.apache.org>be done by bypassing the analyzer for this class, building this clause__________________________________________________
directly instead:
Analyzer analyzer = new StandardAnalyzer();
BooleanQuery query = (BooleanQuery)queryParser.parse("...",
analyzer);
// now add the element clause
query.add(new TermQuery(new Term("element", "POST"))), true,
false);
Perhaps this should become an FAQ...
Doug
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@;jakarta.apache.org>
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com
--
To unsubscribe, e-mail:
For additional commands, e-mail:<mailto:lucene-user-help@;jakarta.apache.org>
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@;jakarta.apache.org>
__________________________________________________
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>
-- ====================================================================== Stefanos Karasavvidis Electronics & Computer Engineer e-mail : [EMAIL PROTECTED]
Multimedia Systems Center S.A. Kissamou 178 73100 Chania - Crete - Hellas http://www.multimedia-sa.gr Tel : +30 821 0 88447 Fax : +30 821 0 88427 -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>
