Op Monday 10 November 2008 22:21:20 schreef Tim Sturge: > Hmmm -- I hadn't thought about that so I took a quick look at the > term vector support. > > What I'm really looking for is a compact but performant > representation of a set of filters on the same (one term field). > Using term vectors would mean an algorithm similar to: > > String myfield; > String myterm; > TermVector tv; > for (int i = 0 ; i < maxDoc ; i++) { > tv = reader.getTermFreqVector(i,country) > if (tv.indexOf(myterm) != -1) { > // include this doc... > } > } > > The key thing I am looking to achieve here is performance comparable > to filters. I suspect getTermFremVector() is not efficient enough but > I'll give it a try. >
Better use a TermDocs on myterm for this, have a look at the code of RangeFilter. Filters are normally created from a slower query by setting a bit in an OpenBitSet at "include this doc". Then they are reused for their speed. Filter caching could help. In case memory becomes a problem and the filters are sparse enough, try and use SortedVIntList as the underlying data structure in the cache. (Sparse enough means less than 1 in 8 of all docs available the index reader.) See also LUCENE-1296 for caching another data structure than the one used to collect the filtered docs. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]