On Fri, Sep 05, 2003 at 10:23:48PM +0000, Clas Rydergren wrote: > Hi, > > I have been experimenting with Lucene for a few hours, and now I'm looking > for a solution to this: > > When using the SimpleAnalyzer for indexing text, data like www.hotmail.com > seem to be indexed as www, hotmail and com which mean that a search for > "hotmail" will return a record. This is the behavior I am looking for! > However, since SimpleAnalyzer do not index numbers by default, I would like > to use the StandardAnalyzer. But, Standardanalyzer do not split the input > stream at ".". > > Ideally I should propably make my own analyser, but that seems to be a bit > complicated to me :(. Which is the simplest possible modification that I > need to make to the Lucene source to make the StandardAnalyzer split, for > example web-addresses, at "." into separately indexed words? > > Can this be made by modifications to the StandardTokenizer.jj? How? What is > the easiest way of getting such modification into the "compiled" Lucene? Is > there a need for recompiling everything? > > Appreciate all help! > > regards > clas
You can stack up the two analyzers, first run the simple then the standard on the poutput. incze --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
