I figured out a work-around in the custom analyzer by doing the folllowing
// --------------- code block ---------------------
public override TokenStream TokenStream(string fieldName,
System.IO.TextReader reader)
{
TextReader newReader = new
StringReader(reader.ReadToEnd().Replace(".", ". "));
TokenStream result = new
StandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, newReader);
// -------------end code block -----------------
It seems to work this way. Thanks again.
On 06/16/2011 11:31 AM, Trevor Watson wrote:
I'm trying to get Lucene.Net to create terms the way that we want it
to happen. I'm currently running Lucene.Net 2.9.2.2.
Bascially, we want the StandardAnalyzer with the exception that we
want terms to be divided at a period as well. The StandardAnalyzer
seems to only split the 2 words into terms if the period is followed
by white-space.
So if we index autoexec.bat it should do [autoexec] and [bat], not
[autoexec.bat]
I was trying to create my own Analyzer that would do it, but could not
figure out how.
So far I have a very basic analyzer that uses the StandardTokenizer
and 2 filters.
// --------- code block ----------------------
class ExtendedStandardAnalyzer : Analyzer
{
public override TokenStream TokenStream(string fieldName,
System.IO.TextReader reader)
{
TokenStream result = new
StandardTokenizer(Lucene.Net.Util.Version.LUCENE_29, reader);
// TokenStream result = new LetterTokenizer(reader); //
doesn't work because we want numbers
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
return result;
}
}
// --------- end code block ------------------
Thanks in advance.