[
https://issues.apache.org/jira/browse/SLING-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591782#comment-15591782
]
Ben Fortuna commented on SLING-5973:
------------------------------------
With thanks to the guys on the Cocoon project I was able to submit a patch to
support HTML encoding of unicode surrogate pairs. I've verified with my test
application that emojis are now displaying correctly with the rewriter enabled.
The patch will be included in the next Cocoon release, but in the meantime I
was able to test with the latest snapshot build here:
https://repository.apache.org/content/groups/snapshots/org/apache/cocoon/cocoon-serializers-charsets/1.0.3-SNAPSHOT/
> HTMLSerializer not handling some unicode characters (emoji, etc.)
> -----------------------------------------------------------------
>
> Key: SLING-5973
> URL: https://issues.apache.org/jira/browse/SLING-5973
> Project: Sling
> Issue Type: Bug
> Components: Extensions
> Reporter: Ben Fortuna
> Attachments: emoji-no-sling-rewriter.png,
> emoji-with-sling-rewriter.png
>
>
> I've noticed that when I have unicode special characters (e.g. emoji) in my
> sling content and the sling rewriter is enabled the characters are not output
> correctly to the browser. For example:
> {code}😁{code} becomes {code}��{code}
> If I disable the rewriter pipeline the output is as expected.
> I've looked in the code and I suspect the issue is in the HTMLSerializer from
> the Cocoon library, however I'm not sure why as it should be using the
> default encoding for output (which is UTF-8). My rewriter pipeline is using
> the default html-generator and html-serializer provided by sling.
> My code is available on GitHub here:
> https://github.com/Whistlepost/emojistrip
> It provides a very simple app/content project pair with some emoji characters
> in the content (see src/main/resources/SLING-INF/content/phrases.json). Many
> thanks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)