StandardTermsDictReader.java

Michael McCandless Sun, 22 Nov 2009 12:50:50 -0800

Yeah I think there will be lots of optimizing we can do, after flex lands.

Maybe stick w/ String for now?  But open an issue, today, to remind us
to cutover to char[] post-flex?


Doing all processing in UTF8 is tantalizing too ;)  This would mean no
conversion of the terms data on iterating from the terms dict...

Mike

On Sun, Nov 22, 2009 at 1:56 PM, Robert Muir <rcm...@gmail.com> wrote:
> ok, I only ask because some rework of this enum could be necessary to take
> advantage of the new api.
>
> examples include changing it to use char[] (easy) to prevent lots of string
> creation, which was unavoidable with TermEnum since it is based on string.
>
> i will never mention this again, but it could also run on byte[] pretty
> easily.
> However I think high-level processing like this should use utf-16
> processing, as java intended, although I'm pretty positive it would be
> extremely fast.
>
> On Sun, Nov 22, 2009 at 1:33 PM, Michael McCandless
> <luc...@mikemccandless.com> wrote:
>>
>> I think you should keep doing all LUCENE-1606 work (and, any other
>> issues) on trunk, and then we merge down to flex branch once it's
>> committed?
>>
>> We shouldn't hold up any trunk features because flex is
>> coming... merging down every so often seems manageable so far (Mark?).
>>
>> I'm hoping to finish flex soonish -- largely what remains (I think!)
>> is better testing (correctness & performance) of the 4-way
>> combinations.  I think the codecs approach is generally working
>> well.. the fact that we have initial Pulsing & PforDelta codecs
>> working is great.
>>
>> Mike
>>
>> On Sun, Nov 22, 2009 at 1:11 PM, Robert Muir <rcm...@gmail.com> wrote:
>> > Mike, I guess what I am implying is should i even bother with
>> > lucene-1606
>> > and trunk?
>> >
>> > or instead, should i be helping you, looking at TermsEnum, and working
>> > on
>> > integrating it into flex?
>> >
>> > On Sun, Nov 22, 2009 at 1:05 PM, Michael McCandless
>> > <luc...@mikemccandless.com> wrote:
>> >>
>> >> On Sun, Nov 22, 2009 at 11:31 AM, Robert Muir <rcm...@gmail.com> wrote:
>> >>
>> >> >> No, not really... just an optimization I found when hunting ;)
>> >> >>
>> >> >> I'm working now on an AutomatonTermsEnum that uses the flex API
>> >> >> directly, to test that performance.
>> >> >>
>> >> >
>> >> > I didn't mean to 'bail out' on this
>> >>
>> >> You didn't 'bail out'; I 'bailed in' ;)  This is the joy of open
>> >> source... great big noisy Bazaar.
>> >>
>> >> > but I could not tell if TermsEnum was close to stabilized
>> >>
>> >> I think it's close; we need to do this port anyway, once automaton is
>> >> committed to trunk, so really I saved Mark some work ;)
>> >>
>> >> > and it might be significant work to convert it?
>> >>
>> >> It wasn't too bad, but maybe you can look it over once I post patch
>> >> and see if I messed anything up :)
>> >>
>> >> > Maybe benching numeric range would be easier and accomplish the same
>> >> > thing?
>> >>
>> >> Yeah benching NRQ would be good too... many benchmarks still to run.
>> >>
>> >> Mike
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> >>
>> >
>> >
>> >
>> > --
>> > Robert Muir
>> > rcm...@gmail.com
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: svn commit: r883088 - in /lucene/java/branches/flex_1458/src/java/org/apache/lucene/index: TermRef.java codecs/standard/StandardTermsDictReader.java

Reply via email to