[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574193#comment-15574193 ]
Ben Fortuna commented on COCOON-2352: ------------------------------------- [~ilgrosso] Fantastic, thanks. I've used the snapshot dependency to test the fix in my project and I did notice one more thing.. whilst it does create the unicode character correctly from the surrogate pair it doesn't actually HTML encode the character. In order to fix this I've created another pull request, which simply encodes the unicode character created from the surrogate pair: https://github.com/apache/cocoon/pull/2/files#diff-2b4ac8dab4cdcce4c7ffd948c2490b52R101 I hope it isn't too much trouble to apply this change also, I'm confident this is the last change required. Many thanks. > XMLEncoder doesn't support Unicode surrogate pairs > -------------------------------------------------- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers > Affects Versions: 2.1.12 > Reporter: Ben Fortuna > Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)