Re: Problem with BooleanQuery

Peyman Faratin Wed, 21 Sep 2011 12:25:32 -0700

Hi Ian

I am not analyzing the title


Field titleField = new Field("title", article.getTitle(),Field.Store.YES, 
Field.Index.NOT_ANALYZED);

Do you think booleanquery is the right approach for solving the problem 
(finding lucene score of a word or a phrase in _a_ particular document)?

thanks for your help

peyman


On Sep 21, 2011, at 1:00 PM, Ian Lea wrote:

> How is the "title" field indexed?  Seems likely it is analyzed in
> which case a TermQuery won't match because "list of newspapers in New
> York" would be analyzed into terms "list", "newspapers", "new", "york"
> assuming things were lowercased, stop words removed etc.
> 
> Maybe you need your "word" as TermQuery, assuming it is lowercased
> etc., and pass the title through query parser.  In other words,
> reverse what you've got for the two fields.
> 
> As for performance, first narrow down where it is taking the time.  If
> it is in lucene, read
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> 
> 
> --
> Ian.
> 
> On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin <[email protected]> 
> wrote:
>> Hi
>> 
>> The problem I would like to solve is determining the lucene score of a word 
>> in _a particular_ given document. The 2 candidates i have been trying are
>> 
>> - QueryWrapperFilter
>> - BooleanQuery
>> 
>> Both are to restrict search within a search space. But according to Doug 
>> Cutting  QueryWrapperFilter option is less preferable than Boolean Query. 
>> However, I am experiencing both performance (very slow) and response 
>> problems (query is not matched to any doc).
>> 
>> The setup is as follows. Given a user query "word":
>> 
>> QueryParser parser = new QueryParser(Version.LUCENE_32, "content",new 
>> StandardAnalyzer(Version.LUCENE_32));
>> Query query = parser.parse(word);
>> Document d = WikiIndexSearcher.doc(match.doc);
>> docTitle = d.get("title");
>> TermQuery titleQuery = new TermQuery(new Term("title", docTitle));
>> BooleanQuery bQuery = new BooleanQuery();
>> bQuery.add(titleQuery, BooleanClause.Occur.MUST);
>> bQuery.add(query, BooleanClause.Occur.MUST);
>> TopDocs hits = WikiIndexSearcher.search(bQuery, 1);
>> 
>> In other words, find a wikipedia doc with a particular title (in example 
>> below it is "list of newspapers in New York 
>> http://en.wikipedia.org/wiki/List_of_newspapers_in_New_York";). We then 
>> create a boolean term query with that must match on the title and content 
>> must match the user query ('american' in the example below).
>> 
>> Here is the output of a run on user query "american" in a doc with title 
>> "list of newspapers in New York").
>> 
>> ... QUERY: content:american
>> ... doc: List of newspapers in New York
>> ... query: +title:List of newspapers in New York +content:american
>> ... explanation 568744: 0.0 = (NON-MATCH) Failure to meet condition(s) of 
>> required/prohibited clause(s)
>>  0.0 = no match on required clause (title:List of newspapers in New York)
>>  0.011818626 = (MATCH) weight(content:american in 212081), product of:
>>    0.15625292 = queryWeight(content:american), product of:
>>      2.4204094 = idf(docFreq=392249, maxDocs=1623450)
>>      0.0645564 = queryNorm
>>    0.075637795 = (MATCH) fieldWeight(content:american in 212081), product of:
>>      1.0 = tf(termFreq(content:american)=1)
>>      2.4204094 = idf(docFreq=392249, maxDocs=1623450)
>>      0.03125 = fieldNorm(field=content, doc=212081)
>> 
>> As you can see there is no match to the query (and hits.totalcounts is 0). 
>> The search is very slow too.
>> 
>> Any help would be much appreciated
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

Re: Problem with BooleanQuery

Reply via email to