> Is core lucene really affected by the change? Or is it only contrib? I
> mean, if we couldn't create an index using core with surrogate pairs and
> other Unicode 4.0 stuff (though I'm not clear on the changes), how can it
> change reading/searching the index?
>
>
Sure, especially core analyzers like SimpleAnalyzer and StopAnalyzer.
Here is an example:

System.out.println(Character.isLetter('\u02C6'));

On JDK 1.4, this returns false.
On JDK 1.5, this returns true.

so, if someone indexes this character on Lucene 2.9, with java 1.4 with one
of these analyzers, then upgrades to 3.0 (they are forced to use java 1.5),
then they must reindex to get the same compat.

btw, the arguments for only 'wierd' characters, I tend to disagree, I just
searched with this character, and see many people using it in their linkedin
profiles, stuff like that (11.2M google results, who knows if all of these
are exact matches).

-- 
Robert Muir
rcm...@gmail.com

Reply via email to