Hi again,
incze I am not sure I follow you. Do you mean that I should index with the SimpleAnalyzer and searching with the StandardAnalyzer? That is not an option for me since SimpleAnalyzer do not index number/digits the way I would like to (however, the StandardAnalyzer does!)
I found that modifying the StandardTokenizer.jj would be possible in order to modify the StandardAnalyzer to my prefered behaviour. When finished modifying the .jj-file, I run it trough JavaCC to get the .java-files. However those .java-files are not possible to compile together with the current distribution because of problems with the exception handling (see http://www.mail-archive.com/[EMAIL PROTECTED]/msg01403.html )
Next try was to compile Lucene (nightly snapshot) from scratch using Ant. That failed with errors with the "class collosions":
[javac] Compiling 128 source files to /home/clryd/lucene/jakarta-lucene/bin/classes
[javac] /home/clryd/lucene/jakarta-lucene/bin/src/org/apache/lucene/analysis/standard/StandardTokenizer.java:14: duplicate class: org.apache.lucene.analysis.standard.StandardTokenizer
[javac] public class StandardTokenizer extends org.apache.lucene.analysis.Tokenizer implements StandardTokenizerConstants {
[javac] ^
[javac] /home/clryd/lucene/jakarta-lucene/bin/src/org/apache/lucene/analysis/standard/StandardTokenizerTokenManager.java:5: duplicate class: org.apache.lucene.analysis.standard.StandardTokenizerTokenManager
[javac] public class StandardTokenizerTokenManager implements StandardTokenizerConstants
[javac] ^
So now my question is, where do I find a Lucene-verion which is possible to compile? Where do I find the source CVS f�r Lucene 1.2 r3, or something similar?
cheers /Clas
> Hi,
>
> I have been experimenting with Lucene for a few hours, and now I'm looking
> for a solution to this:
>
> When using the SimpleAnalyzer for indexing text, data like www.hotmail.com
> seem to be indexed as www, hotmail and com which mean that a search for
> "hotmail" will return a record. This is the behavior I am looking for!
> However, since SimpleAnalyzer do not index numbers by default, I would like
> to use the StandardAnalyzer. But, Standardanalyzer do not split the input
> stream at ".".
>
> Ideally I should propably make my own analyser, but that seems to be a bit
> complicated to me :(. Which is the simplest possible modification that I
> need to make to the Lucene source to make the StandardAnalyzer split, for
> example web-addresses, at "." into separately indexed words?
>
> Can this be made by modifications to the StandardTokenizer.jj? How? What is
> the easiest way of getting such modification into the "compiled" Lucene? Is
> there a need for recompiling everything?
>
> Appreciate all help!
>
> regards
> clas
You can stack up the two analyzers, first run the simple then the standard on the poutput.
incze
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
_________________________________________________________________
Tired of spam? Get advanced junk mail protection with MSN 8. http://join.msn.com/?page=features/junkmail
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
