Ben Fortuna commented on COCOON-2352:

[~ilgrosso] Fantastic, thanks. I've used the snapshot dependency to test the 
fix in my project and I did notice one more thing.. whilst it does create the 
unicode character correctly from the surrogate pair it doesn't actually HTML 
encode the character. 

In order to fix this I've created another pull request, which simply encodes 
the unicode character created from the surrogate pair:


I hope it isn't too much trouble to apply this change also, I'm confident this 
is the last change required. Many thanks.

> XMLEncoder doesn't support Unicode surrogate pairs
> --------------------------------------------------
>                 Key: COCOON-2352
>                 URL: https://issues.apache.org/jira/browse/COCOON-2352
>             Project: Cocoon
>          Issue Type: Bug
>          Components: * Cocoon Core, Blocks: Serializers
>    Affects Versions: 2.1.12
>            Reporter: Ben Fortuna
>            Assignee: Francesco Chicchiriccò
>             Fix For: 2.1.13
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22

This message was sent by Atlassian JIRA

Reply via email to