CasToInlineXml adds whitespace
------------------------------
Key: UIMA-2101
URL: https://issues.apache.org/jira/browse/UIMA-2101
Project: UIMA
Issue Type: Bug
Affects Versions: 2.3.1SDK
Reporter: Steven Bethard
CasToInlineXml adds indentation between adjacent XML elements. E.g. for a
single character document with a single annotation covering that one character,
it will write:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1"
language="x-unspecified">
<uima.tcas.Annotation sofa="Sofa" begin="0" end="1">
</uima.tcas.Annotation>
</uima.tcas.DocumentAnnotation>
</Document>
I think it should instead write everything in a single line, that is:
<?xml version="1.0" encoding="UTF-8"?>
<Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1"
language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1">
</uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
I believe this could be fixed by replacing the line:
XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
with the line:
XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
I think it's a bug that CasToInlineXml is changing the character offsets, but I
would also be happy if there was an alternate constructor or a method on
CasToInlineXml that allowed disabling the formatting.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira