Hi, We have about 120 filters, half is selective but some filters are "boolean".
It's easy to find where the difference comes. binarySearchLookup in DocTermsIndexImpl versus StringIndex : In StringIndex, just a comparaison between Strings : int cmp = lookup[mid].compareTo(key); In DocTermsIndexImpl, the BytesRef has to be retrieved : public BytesRef lookup(int ord, BytesRef ret) { return bytes.fill(ret, termOrdToBytesOffset.get(ord)); } Emmanuel 2013/1/20 Uwe Schindler <u...@thetaphi.de> > Hi, > > in Lucene 4.0 I would recommend to use TermsFilter (from queries module), > not FieldCacheTermsFilter, because the term dictionary is much faster and > it is in this case better to use the posting lists, instead of scanning all > documents (which FCTermsCache does). How many filter terms do you have? Is > the filter selective? To further improve, use CachingWrapperFilter, too > (this will cache filter results, which is useful if you have a set of > Filters/terms that are used quite often). > The problem with FCTermsFilter is: It scans all documents from beginning > to end and looks them up the terms cache. In Lucene 4.0 the structure of > the FieldCache changed to be more memory efficient (which does not hurt the > primary use-case of sorting), but scanning all documents and resolving all > terms is not always the best option (this also heavily relies on your index > structure, FCTermsFilter may still be faster under some circumstances). > > Uwe > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: emmanuel Gosse [mailto:emmanuel.go...@gmail.com] > > Sent: Saturday, January 19, 2013 10:58 PM > > To: java-user@lucene.apache.org > > Subject: FieldCacheTermsFilter performance > > > > Hi, > > > > I would like to share a performance problem about FieldCacheTermsFilter > > between 3.0.3 and 4.0.0 Lucene versions. > > > > I've made tests with the same application with 3.0.3 (my production > > version) and 4.0.0. > > And I found a "big" difference of response time. > > > > I run "real life" injection of 400 000 queries and I obtain the average > of time > > response. > > I used to run this type of tests to validate that we have no performance > > regression. > > > > So I've made other tests to find out where comes this difference. > > Desactivating faceting or changing Directory used or other more... > > > > And for one test, I desactivated the filters (I use only > > FieldCacheTermsFilter) and I obtained the same average of time response. > > > > To give some data : > > 20 millions of documents > > 3 indexes under a multireader > > no indexations, only searcher (indexation is not implemented in this app) > > 400 000 queries with jmeter > > > > Test : > > > > 3.0.3 or 4.0.0 > > Queries without filters : 60ms (average of time response) > > > > Queries with filters: > > 3.0.3 : 150ms > > 4.0.0 : 400ms > > > > The code difference of my application is only the required one to plug > with > > each Lucene version. > > > > The fields used to filter are not stored and in 4.0.0 version, are > stringfield. > > I checked that caches of fieldCache dont move for the test. > > > > I have no more ideas to seek. Maybe I've not understood which type of > field > > I should use. > > > > Emmanuel > > > > ----------- > > Emmanuel Gosse > > Fnac.Com <http://www.fnac.com> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Emmanuel Gosse 06 65 26 96 71