Hi, FYI, the old NumericRangeQuery is fast here, because it rewrites to a constant score BooleanQuery for this low-cardinality case! If you have no real range, then it rewrites to a TermQuery!
Points are different, they are not so good for simple term-based lookups. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Florian Hopf [mailto:mailingli...@florian-hopf.de] > Sent: Wednesday, November 2, 2016 8:19 PM > To: Lucene Users <java-user@lucene.apache.org> > Subject: Re: Understanding performance characteristics of the new point > types > > Thank you both for the explanation, we will switch to StringField with a > TermQuery instead. > > On 02.11.2016 20:09, Michael McCandless wrote: > > Yeah it's best to use StringField for low-cardinality use cases. > > > > When cardinality is low (4 unique values in your case), legacy > > numerics would rewrite to a BooleanQuery, which is much more > > performant for MUST clauses, vs dimensional points which will always > > need to construct an up front bitset for all documents with that > > value. Using StringField instead will ensure you always get a > > BooleanQuery... > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > > > On Wed, Nov 2, 2016 at 2:43 PM, Fuad Efendi <f...@efendi.ca> wrote: > >> Hi florian, > >> > >> If my understanting is correct, you are using IntPoint to index 4 different > >> document types which is overkill; why not to try classic “non-tokenized” > >> keyword field (a.k.a. “legacy string”) for document types? Cardinality is > >> only four for document types. > >> > >> > >> -- > >> > >> Fuad Efendi > >> > >> (416) 993-2060 > >> > >> http://www.tokenizer.ca > >> Recommender Systems > >> > >> > >> On November 2, 2016 at 2:10:14 PM, Florian Hopf ( > >> mailingli...@florian-hopf.de) wrote: > >> > >> Hi, > >> > >> we are indexing different types of documents in one Lucene index. They > >> have most fields in common but we need to filter some types for certain > >> queries. We are using numeric values to determine the types of > documents > >> (1-4). Now, when querying these documents we see that the performance > >> degrades the more documents of a type are in the index. > >> > >> Using a simple test that indexes 10 Mio documents I can see the > >> following when filtering on everything but 100000 documents: > >> > >> * When issuing the query alone the new PointRangeQuery > >> (IntPoint.newExactQuery) is a lot faster than term and legacy numeric > >> (in my case around 2x the speed of the others) > >> * When issuing a bool query that contains a term query that selects 5 > >> documents together with a must query that selects on the numeric the > >> points are 5x slower than legacy numeric > >> (LegacyNumericRangeQuery.newIntRange) and terms (TermQuery) > >> * When doing the same thing with SHOULD instead of MUST for the > >> additional term query the PointRangeQuery is fastests as well > >> > >> I suspect this to be related to the discussion in > >> https://issues.apache.org/jira/browse/LUCENE-7254 > >> > >> Of course there could be something wrong with the way I am measuring > the > >> performance, I'd be happy to share the code. But what I read in the > >> ticket above seems to hint that the points are not suited for every use > >> case? Is it recommended to use StringField in a case like this instead? > >> > >> Regards > >> Florian > >> > >> -- > >> Florian Hopf > >> Freelance Software Developer > >> > >> http://blog.florian-hopf.de > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- > Florian Hopf > Freelance Software Developer > > http://blog.florian-hopf.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org