Hi,

We have about 120 filters, half is selective but some filters are "boolean".


It's easy to find where the difference comes.

binarySearchLookup in DocTermsIndexImpl versus StringIndex :

In StringIndex, just a comparaison between Strings  :
int cmp = lookup[mid].compareTo(key);

In DocTermsIndexImpl, the BytesRef has to be retrieved :

public BytesRef lookup(int ord, BytesRef ret) {
      return bytes.fill(ret, termOrdToBytesOffset.get(ord));
}


Emmanuel


2013/1/20 Uwe Schindler <u...@thetaphi.de>

> Hi,
>
> in Lucene 4.0 I would recommend to use TermsFilter (from queries module),
> not FieldCacheTermsFilter, because the term dictionary is much faster and
> it is in this case better to use the posting lists, instead of scanning all
> documents (which FCTermsCache does). How many filter terms do you have? Is
> the filter selective? To further improve, use CachingWrapperFilter, too
> (this will cache filter results, which is useful if you have a set of
> Filters/terms that are used quite often).
> The problem with FCTermsFilter is: It scans all documents from beginning
> to end and looks them up the terms cache. In Lucene 4.0 the structure of
> the FieldCache changed to be more memory efficient (which does not hurt the
> primary use-case of sorting), but scanning all documents and resolving all
> terms is not always the best option (this also heavily relies on your index
> structure, FCTermsFilter may still be faster under some circumstances).
>
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -----Original Message-----
> > From: emmanuel Gosse [mailto:emmanuel.go...@gmail.com]
> > Sent: Saturday, January 19, 2013 10:58 PM
> > To: java-user@lucene.apache.org
> > Subject: FieldCacheTermsFilter performance
> >
> > Hi,
> >
> > I would like to share a performance problem about FieldCacheTermsFilter
> > between 3.0.3 and 4.0.0 Lucene versions.
> >
> > I've made tests with the same application with 3.0.3 (my production
> > version) and 4.0.0.
> > And I found a "big" difference of response time.
> >
> > I run "real life" injection of 400 000 queries and I obtain the average
> of time
> > response.
> > I used to run this type of tests to validate that we have no performance
> > regression.
> >
> > So I've made other tests to find out where comes this difference.
> > Desactivating faceting or changing Directory used or other more...
> >
> > And for one test, I desactivated the filters (I use only
> > FieldCacheTermsFilter) and I obtained the same average of time response.
> >
> > To give some data :
> > 20 millions of documents
> > 3 indexes under a multireader
> > no indexations, only searcher (indexation is not implemented in this app)
> > 400 000 queries with jmeter
> >
> > Test :
> >
> > 3.0.3 or 4.0.0
> > Queries without filters : 60ms (average of time response)
> >
> > Queries with filters:
> > 3.0.3 : 150ms
> > 4.0.0 : 400ms
> >
> > The code difference of my application is only the required one to plug
> with
> > each Lucene version.
> >
> > The fields used to filter are not stored and in 4.0.0 version, are
> stringfield.
> > I checked that caches of fieldCache dont move for the test.
> >
> > I have no more ideas to seek. Maybe I've not understood which type of
> field
> > I should use.
> >
> > Emmanuel
> >
> > -----------
> > Emmanuel Gosse
> > Fnac.Com <http://www.fnac.com>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Emmanuel Gosse
06 65 26 96 71

Reply via email to