I guess here is where I just say that unicode and java are optimized for utf-16 processing, and so while I agree with byte[] being available in places like this for flex indexing, I'm already nervous about seeing code / optimizations that only work well with latin-1, and are very slow / buggy for anything else.
On Sun, Nov 22, 2009 at 3:58 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Sun, Nov 22, 2009 at 3:52 PM, Robert Muir <rcm...@gmail.com> wrote: > > > > On Sun, Nov 22, 2009 at 3:50 PM, Michael McCandless > > <luc...@mikemccandless.com> wrote: > >> > >> Yeah I think there will be lots of optimizing we can do, after flex > lands. > >> > >> Maybe stick w/ String for now? But open an issue, today, to remind us > >> to cutover to char[] post-flex? > > > > ok, i'll create one. > > Thanks. > > >> Doing all processing in UTF8 is tantalizing too ;) This would mean no > >> conversion of the terms data on iterating from the terms dict... > > > > lets please not go this route :) its gonna be enough trouble fixing the > > char[]-based code for unicode 4, forget about byte[] > > I'll defer to you ;) > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com