Matthew Hatem created UIMA-2849:
-----------------------------------

             Summary: XMLSerializer is not robust to ascii control characters 
                 Key: UIMA-2849
                 URL: https://issues.apache.org/jira/browse/UIMA-2849
             Project: UIMA
          Issue Type: Bug
          Components: Core Java Framework
    Affects Versions: 2.4.0SDK
            Reporter: Matthew Hatem


If any strings in the CAS contain an ascii control character the XMLSerializer 
fails with exception below.  XMLSerializer appears to be escaping other invalid 
XML characters like '&' and '<'.  Perhaps it would be appropriate to remove 
control characters (or escape these characters as well in the case of XML 1.1).

Workaround is to ensure all strings stored in the CAS do not contain ascii 
control characters.  


org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: , 
0x1c
        at 
org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
        at 
org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
        at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
        at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
        at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
        at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
        at 
org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
        at 
org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1516)
        at 
org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1496)
        at bugs.UimaXMIBug.writeXmi(UimaXMIBug.java:68)
        at bugs.UimaXMIBug.main(UimaXMIBug.java:38)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to