well, in all honesty there is a bit of complexity. i leave the StandardTokenizer out of this, it gives the same results regardless of JVM version. it may not be correct, but its consistent, we could wait till 5.0 or 10.0 to make it correct :) Also, because it gives the same results regardless of JVM version, we can actually use the Version logic to improve it, as Uwe showed.
The rest of it is where it gets nasty, Fixing the Simple/StopAnalyzer is actually the worst, because we have to deprecate the isTokenChar(char) and normalize(char) callbacks in favor of int-based versions. We also have to fix this i/o buffering logic present in for example, CharTokenizer, which just does things like refill a buffer of size 4096 without checking to ensure it doesn't break a surrogate pair. and then we have contrib...! so you see why i ask about 'index backwards compatibility', because I don't consider it actually working between 2.9->3.0 anyway, and adding that on top of fixing this stuff, and ensuring API backwards compat, that's especially nasty. > Always depends though. This double index thing you mention is nasty (3.0 > and 3.1 for the unfortunate). I'd swallow a few careful deprecations in > 3.0 to avoid that with my vote. > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com