Erik,

I think there may be a typo in the website.

When I run the AnalyzerDemo :

Analzying "xy&z corporation - [EMAIL PROTECTED]"
        org.apache.lucene.analysis.standard.StandardAnalyzer:
                [xy&z] [corporation] [EMAIL PROTECTED] 

Your website says:

    org.apache.lucene.analysis.standard.StandardAnalyzer:
        [xy&z] [corporation] [EMAIL PROTECTED] [com] 

When I run it it keeps the entire email '[EMAIL PROTECTED]
but according to your website it separates the '[EMAIL PROTECTED]' from the
'com'

Is there a difference between the versions of Lucene? I'm using 1.3rc2.

Plus I think what I want is a StandardAnalyzer with a little tweaking.
The simple one was fine until I realized that it doesn't do numbers,
which I need as part of my search since numbers is important for what
I'm doing. The Standard does numbers but I need it to be a little
different of course. Thanks for the site.

-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 4:58 AM
To: Lucene Users List
Subject: Re: Search Question - not returning desired results


On Tuesday, November 25, 2003, at 12:11  PM, Pleasant, Tracy wrote:
>
> The documents I have index contain information regarding file names 
> also.
>
> For instance 'return_results.pl' or something like that may be in the 
> document fields.
>
> I am not understanding Lucene's way of searching:
>
> 1. If I search for 'return_results', the search does not return 
> anything
> 2. If I search for 'results' or 'return', the search does not return 
> anything
> 3. If I search for 'results.pl', the search does return the document 
> containg 'return_results.pl'
> 4. If I search for 'results~', the search does return the document 
> containg 'return_results.pl'
> 5. If I search for 'return_results~', the search does not return 
> anything
>
> What is going on?
>
> I want it to return the document in all of the situations.
>
> I also don't want to have to use '~' all the time.

We sure do have a recurring theme lately :)  Analysis!

Please refer to my article at java.net:

        http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Look at the AnalysisDemo code.  Copy it over and try it out on the text 
you're using and the Analyzer you're using.  The bracketed text that 
comes out are the "tokens" that you can search on.  It is very very 
important to understand this process and to really know what terms come 
out of text you hand it - otherwise it is a mystery why some things can 
be found and some things cannot despite your expectations to the 
contrary.

A follow-up to the Analysis is querying - and QueryParser has it's own 
set of quirks and caveats related to how things are tokenized/analyzed. 
  And, I've got just the follow-up article for you handy...

        
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

If you digest both of these articles (analysis one first please) then I 
think a lot of questions that get asked on this list will be implicitly 
answered.  Understanding analysis is key.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to