I ran into this problem using current Lucene implementation of rangeQuery applied to genome data (search a chromosome range from 1..20MB). We wanted to use lucene queries like
+organism:fruitfly +chromosome:X +location:[1000000 5000000] to find all the genome features (1000s to 100,000s) that are listed in some megabase range of a genome. This failed quickly with small ranges using the basic Lucene RangeQuery. My solution was to scores each document that falls in the query range into a BitSet: class NumRangeQuery extends Query public NumRangeQuery(Term first, Term last, boolean inc); -- full numeric (integer) range query, can handle large ranges. -- makes a BitSet of documents within range once, and feeds back to Searcher thru score(HitCollector c, int end) as often as called. -- query semantics are same as for RangeQuery -- implicit assumptions are -- first, last Term have integer values, as does indexed field -- indexed field is recoded for alphanumeric sorting; e.g. 2 -> 0000000002, 10 -> 0000000010, -3 -> -0000000003 Find this as part of the 'LuceGene' package for searching genome and bioinformatics databases at http://www.gmod.org/lucegene/ with lucene related source code in cvs here: http://cvs.sourceforge.net/viewcvs.py/gmod/lucegene/src/org/eugenes/index/ NumRangeQuery.java -- range searches of integer fields. LGQueryParser.java -- extension of QueryParser for NumRangeQuery (& other) BioDataAnalyzer.java -- NumberField formats field for indexing -- Don Gilbert > Date: Tue, 18 May 2004 13:35:55 -0700 > From: Andy Goodell <[EMAIL PROTECTED]> > Subject: How to handle range queries over large ranges and avoid Too Many Boolean cla > > In our application we had a similar problem with non-date ranges until > we realized that it wasnt so much that we were searching for the > values in the range as restricting the search to that range, and then > we used an extension to the org.apache.lucene.search.Filter class, and > our implementation got much simpler and faster. -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- [EMAIL PROTECTED]://marmot.bio.indiana.edu/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
