StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters. ------------------------------------------------------------------------------------------
Key: LANG-607 URL: https://issues.apache.org/jira/browse/LANG-607 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 2.5 Environment: java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Microsoft Windows [Version 6.0.6002] Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_16 Java home: C:\Program Files\Java\jdk1.6.0_16\jre Default locale: en_US, platform encoding: Cp1252 OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows" Reporter: Gary Gregory Assignee: Gary Gregory Priority: Minor Fix For: 3.0 StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters. For example, define a test fixture to be the Unicode character U+20000 where U+20000 is written in Java source as "\uD840\uDC00" private static final String CharU20000 = "\uD840\uDC00"; private static final String CharU20001 = "\uD840\uDC01"; You can see Unicode supplementary characters correctly implemented in the JRE call: assertEquals(-1, CharU20000.indexOf(CharU20001)); But this is broken: assertEquals(false, StringUtils.containsAny(CharU20000, CharU20001)); assertEquals(false, StringUtils.containsAny(CharU20001, CharU20000)); This is fine: assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20000)); assertEquals(true, StringUtils.contains(CharU20000 + CharU20001, CharU20001)); assertEquals(true, StringUtils.contains(CharU20000, CharU20000)); assertEquals(false, StringUtils.contains(CharU20000, CharU20001)); because the method calls the JRE to perform the match. More than you want to know: - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.