Let me leave,3q~ 2012/5/8 Li Li <fancye...@gmail.com>
> But this only get (term1 or term2 or term3. ....). you can't > implement (term1 or term2 ...) and (term3 or term4) by this method. > maybe you should writer your own Scorer to deal with this kind of queries. > > On Tue, May 8, 2012 at 9:44 PM, Li Li <fancye...@gmail.com> wrote: > > disjunction query is much slower than conjuction query. That's why > > many search engine use conjuction as default. > > by the way, you say you have 5,000,000 documents. how many documents > > match your query? do you need sort by relevant score or just want to > > match and don't care sort? > > if you don't care sort, you may try to use filter > > e.g. > > Query allDocsQuery=parser.parse("*:*); > > TermsFilter cityFilter = new TermsFilter(); > > for (String term : terms) { > > cityFilter.addTerm(new Term("city",id)); > > } > > searcher.search(allDocsQuery,cityFilter); > > > > I am not sure this method is faster than boolean or query. > > in theory, BooleanScorer is TAAT method(traverse each term in a 2k > > window). BooleanScorer2 is DAAT algorithm. BooleanScorer is faster > > than BooleanScorer2 but it can't support required queries and exlusive > > queries and term count is less than 32(because it use a 32 bit integer > > to remember which term hit). > > TermsFilter is similar to BooleanScorer, it traverse all terms and use > > a bitset to mask hited documents. if your matched document number is > > very large, it may be faster than BooleanScorer2. > > > > > > On Tue, May 8, 2012 at 6:54 PM, 齐保元 <qibaoy...@126.com> wrote: > >> Thanks for you reply,firstly. So many or query is to monitor > the term.One scene is that:if i want to know cities of a province and > events that happens, I may instantiate the query like "(California or > NewYork or SanFransico.... or SomePlace) and (Pollution or Criminal ... or > Alcohol)".So, the long query happens...I hope i have describe the question > clearly.---------------- > >> At 2012-05-08 18:44:13,"Li Li" <fancye...@gmail.com> wrote: > >>>a disjunction (or) query of so many terms is indeed slow. > >>>can u describe your real problem? why you should the disjunction > >>>results of so many terms? > >>> > >>> > >>> > >>>On Sun, May 6, 2012 at 9:57 PM, qibaoy...@126.com <qibaoy...@126.com> > wrote: > >>>> Hi, > >>>> I met a problem about how to search many keywords in about > 5,000,000 documents.For example the query may be like "(a1 or a2 or a3 > ....a200) and (b1 or b2 or b3 or b4 ..... b400)",I found it will take vey > long time(40seconds) to get the the answer in only one field(Title > field),and JVM will throw OutMemory error in more fields(title field plus > content field).Any suggestions or good idea to solve this problem?thanks in > advance. > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>> > >>> > >>>--------------------------------------------------------------------- > >>>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >