Re: Actual min and max-value of NumericField during codec flush

Michael McCandless Thu, 06 Feb 2014 03:06:35 -0800

Somewhere in those numeric trie terms are the exact integers from your
documents, encoded.

You can use oal.util.NumericUtils.prefixCodecToInt to get the int
value back from the BytesRef term.

But you need to filter out the "higher level" terms, e.g. using
NumericUtils.getPrefixCodedLongShift(term) == 0.  Or use
NumericUtils.filterPrefixCodedLongs to wrap a TermsEnum.  I believe
all the terms you want come first, so once you hit a term where
.getPrefixCodedLongShift is > 0, that's your max term and you can stop
checking.

BTW, in 5.0, the codec API for PostingsFormat has improved, so that
you can e.g. pull your own TermsEnum and iterate the terms yourself.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Feb 6, 2014 at 5:16 AM, Ravikumar Govindarajan
<ravikumar.govindara...@gmail.com> wrote:
> I use a Codec to flush data. All methods delegate to actual Lucene42Codec,
> except for intercepting one single-field. This field is indexed as an
> IntField [Numeric-Trie...], with precisionStep=4.
>
> The purpose of the Codec is as follows
>
> 1. Note the first BytesRef for this field
> 2. During finish() call [TermsConsumer.java], note the last BytesRef for
> this field
> 3. Converts both the first/last BytesRef to respective integers
> 4. Store these 2 ints in segment-info diagnostics
>
> The problem with this approach is that, first/last BytesRef is totally
> different from the actual "int" values I try to index. I guess, this is
> because Numeric-Trie explodes all the integers into it's own format of
> BytesRefs. Hence my Codec stores the wrong values in segment-diagnostics
>
> Is there a way I can record actual min/max int-values correctly in my codec
> and still support NumericRange search?
>
> --
> Ravi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Actual min and max-value of NumericField during codec flush

Reply via email to