Dominik Strecker created LANG-1343:
--------------------------------------

             Summary: StringUtils#abbreviate breaks up surrogate pairs
                 Key: LANG-1343
                 URL: https://issues.apache.org/jira/browse/LANG-1343
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 3.6
            Reporter: Dominik Strecker
            Priority: Minor


If the last char in the remaining substring is the first char of a surrogate 
pair, the resulting string has an illegal surrogate pair with the second char 
of the surrogate pair being the first char of the ellipsis.


{code:java}
StringUtils.abbreviate("\uD83D\uDCA9\uD83D\uDCA9\uD83D\uDCA9", 4); // returns 
"\uD83D..."
{code}

In my case this breaks further along when the string is transformed to UTF-8 
for a SOAP request.

Should this at least be mentioned in the Javadoc?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to