On Sep 22, 2011, at 4:59 AM, Ian Lea wrote: >> I am not analyzing the title >> >> Field titleField = new Field("title", article.getTitle(),Field.Store.YES, >> Field.Index.NOT_ANALYZED); > > OK. But the output you quote says "no match on required clause > (title:List of newspapers in New York)" so something is out of synch > somewhere.
i am reindexing the content with no analysis in case. > > What does Luke show? See luke shows the title as unanalyzed text. > http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F > for more things to check. i'll walk through them as soon as i can. > >> Do you think booleanquery is the right approach for solving the problem >> (finding lucene score of a word or a phrase in _a_ particular document)? > > Sounds OK to me. You could look at the contrib MemoryIndex as a > possible alternative. thanks for your help Ian Peyman > > > -- > Ian. > > >> On Sep 21, 2011, at 1:00 PM, Ian Lea wrote: >> >>> How is the "title" field indexed? Seems likely it is analyzed in >>> which case a TermQuery won't match because "list of newspapers in New >>> York" would be analyzed into terms "list", "newspapers", "new", "york" >>> assuming things were lowercased, stop words removed etc. >>> >>> Maybe you need your "word" as TermQuery, assuming it is lowercased >>> etc., and pass the title through query parser. In other words, >>> reverse what you've got for the two fields. >>> >>> As for performance, first narrow down where it is taking the time. If >>> it is in lucene, read >>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed >>> >>> >>> -- >>> Ian. >>> >>> On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin <pey...@robustlinks.com> >>> wrote: >>>> Hi >>>> >>>> The problem I would like to solve is determining the lucene score of a >>>> word in _a particular_ given document. The 2 candidates i have been trying >>>> are >>>> >>>> - QueryWrapperFilter >>>> - BooleanQuery >>>> >>>> Both are to restrict search within a search space. But according to Doug >>>> Cutting QueryWrapperFilter option is less preferable than Boolean Query. >>>> However, I am experiencing both performance (very slow) and response >>>> problems (query is not matched to any doc). >>>> >>>> The setup is as follows. Given a user query "word": >>>> >>>> QueryParser parser = new QueryParser(Version.LUCENE_32, "content",new >>>> StandardAnalyzer(Version.LUCENE_32)); >>>> Query query = parser.parse(word); >>>> Document d = WikiIndexSearcher.doc(match.doc); >>>> docTitle = d.get("title"); >>>> TermQuery titleQuery = new TermQuery(new Term("title", docTitle)); >>>> BooleanQuery bQuery = new BooleanQuery(); >>>> bQuery.add(titleQuery, BooleanClause.Occur.MUST); >>>> bQuery.add(query, BooleanClause.Occur.MUST); >>>> TopDocs hits = WikiIndexSearcher.search(bQuery, 1); >>>> >>>> In other words, find a wikipedia doc with a particular title (in example >>>> below it is "list of newspapers in New York >>>> http://en.wikipedia.org/wiki/List_of_newspapers_in_New_York"). We then >>>> create a boolean term query with that must match on the title and content >>>> must match the user query ('american' in the example below). >>>> >>>> Here is the output of a run on user query "american" in a doc with title >>>> "list of newspapers in New York"). >>>> >>>> ... QUERY: content:american >>>> ... doc: List of newspapers in New York >>>> ... query: +title:List of newspapers in New York +content:american >>>> ... explanation 568744: 0.0 = (NON-MATCH) Failure to meet condition(s) of >>>> required/prohibited clause(s) >>>> 0.0 = no match on required clause (title:List of newspapers in New York) >>>> 0.011818626 = (MATCH) weight(content:american in 212081), product of: >>>> 0.15625292 = queryWeight(content:american), product of: >>>> 2.4204094 = idf(docFreq=392249, maxDocs=1623450) >>>> 0.0645564 = queryNorm >>>> 0.075637795 = (MATCH) fieldWeight(content:american in 212081), product >>>> of: >>>> 1.0 = tf(termFreq(content:american)=1) >>>> 2.4204094 = idf(docFreq=392249, maxDocs=1623450) >>>> 0.03125 = fieldNorm(field=content, doc=212081) >>>> >>>> As you can see there is no match to the query (and hits.totalcounts is 0). >>>> The search is very slow too. >>>> >>>> Any help would be much appreciated >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org