Iterating character-by-character is different than considering the entire string at once so your observation is correct, that's how it's supposed to work. In particular, note this in String#toLowerCase documentation:
"Since case mappings are not always 1:1 char mappings, the resulting String may be a different length than the original String." So it simply cannot be the same as iterating char-by-char. Dawid On Sat, Dec 1, 2012 at 6:32 AM, Trejkaz <trej...@trypticon.org> wrote: > On Fri, Nov 30, 2012 at 8:22 PM, Ian Lea <ian....@gmail.com> wrote: >> Sounds like a side effect of possibly different, locale-dependent, >> results of using String.toLowerCase() and/or Character.toLowerCase(). >> >> http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#toLowerCase() >> specifically mentions Turkish. >> >> A Google search for "Character.toLowerCase() turkish" gets hits which >> sound relevant. > > Certainly Turkish has special rules because of that uppercase I with > dot. I was more wondering whether LowerCaseFilter was intentionally > doing it differently to String.toLowerCase() or whether it was some > kind of unintentional side-effect of using Character.toLowerCase() > iteratively. > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org