Re: replace values in index

Mark Miller Thu, 12 Jul 2007 07:09:42 -0700

While it is possible to alter the StandardAnalyzer, depending on moredetails of your source text, it may be better to use a differentanalyzer or make your own. The StandardAnalyzer is quite slow if you donot need all of its features, and modifying it will make it harder tokeep up with bug fixes or improvements.

That said, StandardAnalyzer does split on commas, so you might want tocheck into whats really going on.

I suspect that 'word1,word2,word3,word4,word5' is being recognized as aNUM by StandardAnalzyer. A NUM match will keep a comma deliminated listintact as long as every other word contains a digit.

You might alter the <#P regular expression in StandardAnalyzer.jj bytaking out the ','. This will take out certain matches (like the matchyour getting <g>), but will stop screwing up your matches.


- Mark

Jeff wrote:

I have documents with lots of text. Part of the text is in the following
format:

word1,word2,word3,word4,word5

I am currently using the StandardAnalyzer and everything is working great

with the other data, except I can't query for 'word3' as a ',' isn't atoken

seperator. Is there an easy way to add ',' as a token seperator?

Thanks,

-Jeff


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: replace values in index

Reply via email to