I have been experimenting with Lucene for a few hours, and now I'm looking for a solution to this:
When using the SimpleAnalyzer for indexing text, data like www.hotmail.com seem to be indexed as www, hotmail and com which mean that a search for "hotmail" will return a record. This is the behavior I am looking for! However, since SimpleAnalyzer do not index numbers by default, I would like to use the StandardAnalyzer. But, Standardanalyzer do not split the input stream at ".".
Ideally I should propably make my own analyser, but that seems to be a bit complicated to me :(. Which is the simplest possible modification that I need to make to the Lucene source to make the StandardAnalyzer split, for example web-addresses, at "." into separately indexed words?
Can this be made by modifications to the StandardTokenizer.jj? How? What is the easiest way of getting such modification into the "compiled" Lucene? Is there a need for recompiling everything?
Appreciate all help!
regards clas
_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
