Re: [Lucene.Net] Analyzer Question for Lucene.Net

Trevor Watson Thu, 16 Jun 2011 08:57:09 -0700

I figured out a work-around in the custom analyzer by doing the folllowing


// --------------- code block ---------------------

public override TokenStream TokenStream(string fieldName,System.IO.TextReader reader)

TextReader newReader = newStringReader(reader.ReadToEnd().Replace(".", ". "));TokenStream result = newStandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, newReader);


// -------------end code block -----------------


It seems to work this way.  Thanks again.




On 06/16/2011 11:31 AM, Trevor Watson wrote:

I'm trying to get Lucene.Net to create terms the way that we want itto happen. I'm currently running Lucene.Net 2.9.2.2.
Bascially, we want the StandardAnalyzer with the exception that wewant terms to be divided at a period as well. The StandardAnalyzerseems to only split the 2 words into terms if the period is followedby white-space.
So if we index autoexec.bat it should do [autoexec] and [bat], not[autoexec.bat]
I was trying to create my own Analyzer that would do it, but could notfigure out how.
So far I have a very basic analyzer that uses the StandardTokenizerand 2 filters.
// --------- code block ----------------------

class ExtendedStandardAnalyzer : Analyzer
{
public override TokenStream TokenStream(string fieldName,System.IO.TextReader reader)
    {
TokenStream result = newStandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, reader);// TokenStream result = new LetterTokenizer(reader); //doesn't work because we want numbers
        result = new StandardFilter(result);
        result = new LowerCaseFilter(result);

        return result;
    }
}
// --------- end code block ------------------


Thanks in advance.

Re: [Lucene.Net] Analyzer Question for Lucene.Net

Reply via email to