The only caveat to your VerbatimAnalyzer is that it will still split strings that are over 255 characters. CharTokenizer does that. Granted, though, that keyword fields probably don't make much sense to be that long.

As mentioned yesterday - I added the LIA KeywordAnalyzer into the contrib area of Subversion. I had built one like you had also, but the one I contributed reads the entire input stream into a StringBuffer ensuring it does not get split like CharTokenizer would.

        Erik


On Feb 9, 2005, at 4:40 AM, Miles Barr wrote:

On Tue, 2005-02-08 at 12:19 -0500, Steven Rowe wrote:
Why is there no KeywordAnalyzer?  That is, an analyzer which doesn't
mess with its input in any way, but just returns it as-is?

I realize that under most circumstances, it would probably be more code
to use it than just constructing a TermQuery, but having it would
regularize query handling, and simplify new users' experience. And for
the purposes of the PerFieldAnalyzerWrapper, it could be helpful.

It's fairly straightforward to write one. Here's the one I put together for PerFieldAnalyzerWrapper situations:


package org.apache.lucene.analysis;

import java.io.Reader;

public class VerbatimAnalyzer extends Analyzer {

    public VerbatimAnalyzer() {
        super();
    }

    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new VerbatimTokenizer(reader);

        return result;
    }


/** * This tokenizer assumes that the entire input is just one token. */ public static class VerbatimTokenizer extends CharTokenizer {

        public VerbatimTokenizer(Reader reader) {
            super(reader);
        }

        protected boolean isTokenChar(char c) {
            return true;
        }
    }
}


-- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to