Jasper Krauter created BATIK-1328:
-------------------------------------

             Summary: No support for unicode characters in U+10000 - U+10FFFF 
range
                 Key: BATIK-1328
                 URL: https://issues.apache.org/jira/browse/BATIK-1328
             Project: Batik
          Issue Type: Bug
          Components: SVG DOM
    Affects Versions: 1.13
            Reporter: Jasper Krauter


The SVG Transcoder checks for valid XML characters but does not take into 
account characters that, due to the Java String implementation, are represented 
by two Java chars (UTF-16 Surrogate Pairs). Since neither of those individual 
chars are a valid XML character on their own, the transcoder fails. But the 
[XML1.0 specification|https://www.w3.org/TR/xml/#charsets] does allow for those 
characters.

In {{{}org.apache.batik.dom.util.DOMUtilities#contentToString{}}}, instead of 
{{{}String#charAt{}}}, rather {{String#codePointAt}} should be used to extract 
individual characters. Using {{{}StringBuffer#appendCodePoint{}}}, the code 
points can properly appended to the output string. The methods that check for 
character validity already account for code points.

Code example to reproduce the issue:
{code:java}
String svgNS = SVGDOMImplementation.SVG_NAMESPACE_URI;
Document doc = 
SVGDOMImplementation.getDOMImplementation().createDocument(svgNS, "svg", null);
Element text = doc.createElementNS(svgNS, "text");
text.setTextContent("Hello, world! 👋");
doc.getDocumentElement().appendChild(text);

var transcoder = new SVGTranscoder();
TranscoderOutput out = new TranscoderOutput(new OutputStreamWriter(System.out));
TranscoderInput in = new TranscoderInput(doc);
transcoder.transcode(in, out);{code}
throws
{code:java}
Exception in thread "main" java.lang.RuntimeException: IO:Invalid character
    at 
batik.transcoder@1.13/org.apache.batik.transcoder.svg2svg.SVGTranscoder.transcode(SVGTranscoder.java:179){code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: batik-dev-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: batik-dev-h...@xmlgraphics.apache.org

Reply via email to