Yeah I think there will be lots of optimizing we can do, after flex lands. Maybe stick w/ String for now? But open an issue, today, to remind us to cutover to char[] post-flex?
Doing all processing in UTF8 is tantalizing too ;) This would mean no conversion of the terms data on iterating from the terms dict... Mike On Sun, Nov 22, 2009 at 1:56 PM, Robert Muir <rcm...@gmail.com> wrote: > ok, I only ask because some rework of this enum could be necessary to take > advantage of the new api. > > examples include changing it to use char[] (easy) to prevent lots of string > creation, which was unavoidable with TermEnum since it is based on string. > > i will never mention this again, but it could also run on byte[] pretty > easily. > However I think high-level processing like this should use utf-16 > processing, as java intended, although I'm pretty positive it would be > extremely fast. > > On Sun, Nov 22, 2009 at 1:33 PM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> >> I think you should keep doing all LUCENE-1606 work (and, any other >> issues) on trunk, and then we merge down to flex branch once it's >> committed? >> >> We shouldn't hold up any trunk features because flex is >> coming... merging down every so often seems manageable so far (Mark?). >> >> I'm hoping to finish flex soonish -- largely what remains (I think!) >> is better testing (correctness & performance) of the 4-way >> combinations. I think the codecs approach is generally working >> well.. the fact that we have initial Pulsing & PforDelta codecs >> working is great. >> >> Mike >> >> On Sun, Nov 22, 2009 at 1:11 PM, Robert Muir <rcm...@gmail.com> wrote: >> > Mike, I guess what I am implying is should i even bother with >> > lucene-1606 >> > and trunk? >> > >> > or instead, should i be helping you, looking at TermsEnum, and working >> > on >> > integrating it into flex? >> > >> > On Sun, Nov 22, 2009 at 1:05 PM, Michael McCandless >> > <luc...@mikemccandless.com> wrote: >> >> >> >> On Sun, Nov 22, 2009 at 11:31 AM, Robert Muir <rcm...@gmail.com> wrote: >> >> >> >> >> No, not really... just an optimization I found when hunting ;) >> >> >> >> >> >> I'm working now on an AutomatonTermsEnum that uses the flex API >> >> >> directly, to test that performance. >> >> >> >> >> > >> >> > I didn't mean to 'bail out' on this >> >> >> >> You didn't 'bail out'; I 'bailed in' ;) This is the joy of open >> >> source... great big noisy Bazaar. >> >> >> >> > but I could not tell if TermsEnum was close to stabilized >> >> >> >> I think it's close; we need to do this port anyway, once automaton is >> >> committed to trunk, so really I saved Mark some work ;) >> >> >> >> > and it might be significant work to convert it? >> >> >> >> It wasn't too bad, but maybe you can look it over once I post patch >> >> and see if I messed anything up :) >> >> >> >> > Maybe benching numeric range would be easier and accomplish the same >> >> > thing? >> >> >> >> Yeah benching NRQ would be good too... many benchmarks still to run. >> >> >> >> Mike >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > >> > >> > >> > -- >> > Robert Muir >> > rcm...@gmail.com >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > > > -- > Robert Muir > rcm...@gmail.com > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org