actually i thought about this. i change my story.

deprecating anything is stupid, because its still not back compatible, i.e.
Character.isLetter(char) even returns different results now, even if we
invoke it.

hard break is the only solution.

we should have done this deprecation in 2.9, but its chicken-and-egg, could
not do it because you need java 5 to support unicode 4.

On Mon, Nov 16, 2009 at 9:57 PM, Robert Muir <rcm...@gmail.com> wrote:

> completely ignoring the difficulty, I would propose to fix everything to
> correspond with the java 1.5 unicode version, for consistency.
> I would exempt StandardTokenizer, because its completely inside our
> control. we can fix it at our leisure.
>
> for the rest of this stuff, its already a 'change in runtime behavior' when
> moving from 1.4 to 1.5, even though we didn't touch code.
> i would suggest making this a one-time pain for the users so they dont have
> to do it again in 3.1
> this means for CharTokenizer adding the deprecations and reflection and
> caching for the reflection that Uwe did to make TokenStream fast and work
> like this.
> and mucking with complicated i/o buffering logic as mentioned before.
>
>
> For the other side, I'll tell you what I have done in practice.
> I usually say, there is no way in hell I will refactor some existing
> codebase to support suppl. characters.
> And i find a way to isolate just chinese, support it for only that
> language, and leave the other stuff broken.
>
> I'm not really sure that is the appropriate way to go for apache lucene,
> but I felt it was fair to at least give that perspective.
> Even if we did that, the non-chinese users still need to reindex anyway,
> except for nothing (no real gain, they still don't have unicode 4 support,
> just different behavior).
>
>
> On Mon, Nov 16, 2009 at 9:47 PM, Mark Miller <markrmil...@gmail.com>wrote:
>
>> So whats your best recommendation? Ignoring the difficulty and just
>> considering whats best for users?
>>
>> Robert Muir wrote:
>> > well, in all honesty there is a bit of complexity.
>> > i leave the StandardTokenizer out of this, it gives the same results
>> > regardless of JVM version.
>> > it may not be correct, but its consistent, we could wait till 5.0 or
>> > 10.0 to make it correct :)
>> > Also, because it gives the same results regardless of JVM version, we
>> > can actually use the Version logic to improve it, as Uwe showed.
>> >
>> > The rest of it is where it gets nasty,
>> > Fixing the Simple/StopAnalyzer is actually the worst, because we have
>> > to deprecate the isTokenChar(char) and normalize(char) callbacks in
>> > favor of int-based versions.
>> > We also have to fix this i/o buffering logic present in for example,
>> > CharTokenizer, which just does things like refill a buffer of size
>> > 4096 without checking to ensure it doesn't break a surrogate pair.
>> >
>> > and then we have contrib...!
>> >
>> > so you see why i ask about 'index backwards compatibility', because I
>> > don't consider it actually working between 2.9->3.0 anyway, and adding
>> > that on top of fixing this stuff, and ensuring API backwards compat,
>> > that's especially nasty.
>> >
>> >
>> >
>> >     Always depends though. This double index thing you mention is
>> >     nasty (3.0
>> >     and 3.1 for the unfortunate). I'd swallow a few careful
>> >     deprecations in
>> >     3.0 to avoid that with my vote.
>> >
>> >     --
>> >     - Mark
>> >
>> >     http://www.lucidimagination.com
>> >
>> >
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> >     To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> >     <mailto:java-dev-unsubscr...@lucene.apache.org>
>> >     For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> >     <mailto:java-dev-h...@lucene.apache.org>
>> >
>> >
>> >
>> >
>> > --
>> > Robert Muir
>> > rcm...@gmail.com <mailto:rcm...@gmail.com>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com

Reply via email to