> Is core lucene really affected by the change? Or is it only contrib? I > mean, if we couldn't create an index using core with surrogate pairs and > other Unicode 4.0 stuff (though I'm not clear on the changes), how can it > change reading/searching the index? > > Sure, especially core analyzers like SimpleAnalyzer and StopAnalyzer. Here is an example:
System.out.println(Character.isLetter('\u02C6')); On JDK 1.4, this returns false. On JDK 1.5, this returns true. so, if someone indexes this character on Lucene 2.9, with java 1.4 with one of these analyzers, then upgrades to 3.0 (they are forced to use java 1.5), then they must reindex to get the same compat. btw, the arguments for only 'wierd' characters, I tend to disagree, I just searched with this character, and see many people using it in their linkedin profiles, stuff like that (11.2M google results, who knows if all of these are exact matches). -- Robert Muir rcm...@gmail.com