RE: TermRangeTermsEnum

Uwe Schindler Wed, 07 Aug 2013 06:28:44 -0700

Why don’t you use NumericRangeQuery’s enum? If the field is indexed as 
NumericField this should work.


 

Uwe

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: [email protected]

 

From: [email protected] [mailto:[email protected]] On Behalf Of Chet Vora
Sent: Wednesday, August 07, 2013 3:18 PM
To: [email protected]
Subject: TermRangeTermsEnum

 

Hi

 

Posting this to dev as well as this is related to Lucene internals.

 

I have an index consisting of a double value that can range between certain 
values and an associated tag. I am trying to find all the docs which match a 
certain tag (or combination of tags) and a certain range. I'm trying to use the 
TermRangeTermsEnum from the Flex API as part of a custom parser. This is how 
I'm using it (in the getDocIdSet() method). 

 

 

        Terms myField = fields.terms("Count");         //this is the field I'm 
interested in for range enum

        termsEnum = myField.iterator(termsEnum);

        BytesRef lowerBound = new BytesRef();

        
NumericUtils.longToPrefixCodedBytes(NumericUtils.doubleToSortableLong(lower), 
0, lowerBound);

        BytesRef upperBound = new BytesRef();

        
NumericUtils.longToPrefixCodedBytes(NumericUtils.doubleToSortableLong(upper), 
0, upperBound);

        TermRangeTermsEnum termRangeTermsEnum=  new 
TermRangeTermsEnum(termsEnum, lowerBound, upperBound, true, true);

 

                      DocsEnum docs = null;

        FixedBitSet rangeFilter = new FixedBitSet(reader.maxDoc());

        // Create a bitset of all docs that pass range filter

        while (termRangeTermsEnum.next() != null) {

            docs = termRangeTermsEnum.docs(startResults, docs, 
DocsEnum.FLAG_NONE); // no freq since we don't need them

            while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {

                rangeFilter.set(docs.docID());

        }

        }

 

                      Terms tagField = fields.terms("Tag");//the other field I 
want to filter by

        termsEnum = tagField.iterator(termsEnum);

        // filter by docs who match the tag 

                      private String[] tags;

        Set<Integer> myIds = new HashSet<Integer>();

 

        for (String s : tags) {

            ref = new BytesRef(s);

            if (termsEnum.seekExact(ref, false)) { // don't use cache since we 
could pollute the cache here easily

                docs = termsEnum.docs(rangeFilter, docs, DocsEnum.FLAG_NONE); 
// no freq since we don't need them

                while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {

                    myIds.add(docs.docID());

                }

            }

        }

                      

This does return me the results I want but doesn't perform very well. By 
comparison, using TermsEnum and doing a check by hand of the range performs 
much better -its is an order of a magnitude better for <1000 records and about 
3-4 times faster for more. 

 

                      Terms tagField = fields.terms("Tag");

        termsEnum = tagField.iterator(termsEnum);

        Set<Integer> myIds = new HashSet<Integer>();

        double value;

        for (String s : tags) {

            ref = new BytesRef(s);

            if (termsEnum.seekExact(ref, false)) { // don't use cache since we 
could pollute the cache here easily

                docs = termsEnum.docs(initialSet, docs, DocsEnum.FLAG_NONE); // 
no freq since we don't need them

                while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {

                    value = cache.get(docs.docID());

                    if (value >= lowerBound && value <= upperBound)            
//check for the range

                        myIds.add(docs.docID());

                }

            }

        }

 

 

Is this the expected usage of TermRangeTermsEnum? Is this the expected 
performance also? Any pointers or helpful references to doing this in a more 
permormant way are welcome.

 

Regards,

CV

RE: TermRangeTermsEnum

Reply via email to