[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590441#comment-15590441 ]
Ben Fortuna commented on COCOON-2352: ------------------------------------- My sincerest apologies, but I discovered a bug in the patch I submitted. Unfortunately I had assumed we can cast an int to a char to encode the higher order unicode characters, but of course this isn't possible and is why unicode surrogate pairs exist in the first place.. So I had to make a slight change to the code (again) - I have updated two files: XMLEncoder and XMLEncoderTestCase to ensure that after combining a surrogate pair to a code point we are then correctly encoding the int value as an HTML-compatible string. https://github.com/apache/cocoon/pull/3/files Thanks again, and fingers crossed there are no more changes required. :-) > XMLEncoder doesn't support Unicode surrogate pairs > -------------------------------------------------- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers > Affects Versions: 2.1.12 > Reporter: Ben Fortuna > Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)