Matthew Hatem created UIMA-2849:
-----------------------------------
Summary: XMLSerializer is not robust to ascii control characters
Key: UIMA-2849
URL: https://issues.apache.org/jira/browse/UIMA-2849
Project: UIMA
Issue Type: Bug
Components: Core Java Framework
Affects Versions: 2.4.0SDK
Reporter: Matthew Hatem
If any strings in the CAS contain an ascii control character the XMLSerializer
fails with exception below. XMLSerializer appears to be escaping other invalid
XML characters like '&' and '<'. Perhaps it would be appropriate to remove
control characters (or escape these characters as well in the case of XML 1.1).
Workaround is to ensure all strings stored in the CAS do not contain ascii
control characters.
org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: ,
0x1c
at
org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
at
org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
at
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
at
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
at
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
at
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
at
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
at
org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1516)
at
org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1496)
at bugs.UimaXMIBug.writeXmi(UimaXMIBug.java:68)
at bugs.UimaXMIBug.main(UimaXMIBug.java:38)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira