On Tuesday 17 January 2006 20:52, Dan Katz wrote:
...
> Question 1)   Is there a way in Lucene to have some sort of limit based
> on term count.  For example,  "atleast5 Apple" to find items with the
> word apple only when it has at least 5 mentions.

This can be done, but you'll need to write your own TermQuery and
TermScorer for this. Just add the requirement of the minimum number
of term occurrences in your own TermScorer.
Have a look at the Java code of TermScorer, it should be straightforward
to do this.

>  
> Question 2) We use Lucene to index articles from Web sites. When I have
> these documents I want to find when a Web site is mentioned, but not the
> email addresses of a Web site.   I write something like "website.com NOT
> [EMAIL PROTECTED]".  This works to a point.  However, it also excludes the
> documents when the website.com AND the @website.com is mentioned.  I
> want to eliminate the content that only has @website.com but keep it
> whenever the @ is not present.  Does anyone know how I would write this
> query?

You'll need to make sure the the query term website.com does not match
@website.com so you can simply query for website.com.
I don't know how the StandardAnalyzer deals with this case.
You may need to use your own Analyzer to make sure that @website.com
is only indexed as @website.com and never as website.com .
If you need to know how some text is indexed try Luke:
http://www.getopt.org/luke/

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to