Re: Problem with BooleanQuery

Ian Lea Wed, 21 Sep 2011 10:00:59 -0700

How is the "title" field indexed?  Seems likely it is analyzed in
which case a TermQuery won't match because "list of newspapers in New
York" would be analyzed into terms "list", "newspapers", "new", "york"
assuming things were lowercased, stop words removed etc.


Maybe you need your "word" as TermQuery, assuming it is lowercased
etc., and pass the title through query parser.  In other words,
reverse what you've got for the two fields.

As for performance, first narrow down where it is taking the time.  If
it is in lucene, read
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed


--
Ian.

On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin <pey...@robustlinks.com> wrote:
> Hi
>
> The problem I would like to solve is determining the lucene score of a word 
> in _a particular_ given document. The 2 candidates i have been trying are
>
> - QueryWrapperFilter
> - BooleanQuery
>
> Both are to restrict search within a search space. But according to Doug 
> Cutting  QueryWrapperFilter option is less preferable than Boolean Query. 
> However, I am experiencing both performance (very slow) and response problems 
> (query is not matched to any doc).
>
> The setup is as follows. Given a user query "word":
>
> QueryParser parser = new QueryParser(Version.LUCENE_32, "content",new 
> StandardAnalyzer(Version.LUCENE_32));
> Query query = parser.parse(word);
> Document d = WikiIndexSearcher.doc(match.doc);
> docTitle = d.get("title");
> TermQuery titleQuery = new TermQuery(new Term("title", docTitle));
> BooleanQuery bQuery = new BooleanQuery();
> bQuery.add(titleQuery, BooleanClause.Occur.MUST);
> bQuery.add(query, BooleanClause.Occur.MUST);
> TopDocs hits = WikiIndexSearcher.search(bQuery, 1);
>
> In other words, find a wikipedia doc with a particular title (in example 
> below it is "list of newspapers in New York 
> http://en.wikipedia.org/wiki/List_of_newspapers_in_New_York";). We then create 
> a boolean term query with that must match on the title and content must match 
> the user query ('american' in the example below).
>
> Here is the output of a run on user query "american" in a doc with title 
> "list of newspapers in New York").
>
> ... QUERY: content:american
> ... doc: List of newspapers in New York
> ... query: +title:List of newspapers in New York +content:american
> ... explanation 568744: 0.0 = (NON-MATCH) Failure to meet condition(s) of 
> required/prohibited clause(s)
>  0.0 = no match on required clause (title:List of newspapers in New York)
>  0.011818626 = (MATCH) weight(content:american in 212081), product of:
>    0.15625292 = queryWeight(content:american), product of:
>      2.4204094 = idf(docFreq=392249, maxDocs=1623450)
>      0.0645564 = queryNorm
>    0.075637795 = (MATCH) fieldWeight(content:american in 212081), product of:
>      1.0 = tf(termFreq(content:american)=1)
>      2.4204094 = idf(docFreq=392249, maxDocs=1623450)
>      0.03125 = fieldNorm(field=content, doc=212081)
>
> As you can see there is no match to the query (and hits.totalcounts is 0). 
> The search is very slow too.
>
> Any help would be much appreciated

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Problem with BooleanQuery

Reply via email to