On the rare chance that someone else stumbles across this problem ...

By default, Java's Xalan transformer for creating XML documents does not
correctly encode emojis. Instead of 👍 for the thumbs up emoji,
Xalan encodes it as ��. As Arthur pointed out, this is not a
valid entity encoding.

One solution is to use Saxonica's Saxon 11 transformer, which produces the
expected output:

  <html>
    <head><meta charset="utf8"/></head>
    <body>
      <p id="caret">the 👍 emoji</p>
    </body>
  </html>

In Java, switching to Saxon entails installing the Jar files for Saxonica
and its resolvers. Then set the system property before invoking the XML
transformer: System.setProperty( "javax.xml.transform.TransformerFactory",
"net.sf.saxon.TransformerFactoryImpl" );

ConTeXt handles the emoji from the transformed XML file without any issues.

Thank you, Arthur.
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Reply via email to