[ 
https://issues.apache.org/jira/browse/UIMA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998480#comment-13998480
 ] 

Richard Eckart de Castilho commented on UIMA-3818:
--------------------------------------------------

CoreNLP transitively depends on an older version of Xalan (2.7.0). We have had 
other problems in the past when buggy old versions of Xalan where on the 
classpath. Instead of excluding Xalan, you could also try adding an explicit 
dependency on Xalan version 2.7.1. Does that work?

> Unsuported XML entity by XmiCas(De)serializer
> ---------------------------------------------
>
>                 Key: UIMA-3818
>                 URL: https://issues.apache.org/jira/browse/UIMA-3818
>             Project: UIMA
>          Issue Type: Bug
>          Components: Collection Processing
>    Affects Versions: 2.4.2SDK
>            Reporter: Gregoire Jadi
>
> The UTF8 character '𝒪' can not be deserialized by 
> `XmiCasDeserializer.deserialize'.
> Here is a way to reproduce this:
> {code:java}
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.InputStream;
> import java.io.OutputStream;
> import org.apache.uima.cas.impl.XmiCasDeserializer;
> import org.apache.uima.cas.impl.XmiCasSerializer;
> import org.apache.uima.fit.factory.JCasFactory;
> import org.apache.uima.jcas.JCas;
> public class Test {
>     public static void main(String[] args) throws Exception {
>         JCas jCas = JCasFactory.createJCas();
>         jCas.setDocumentText("𝒪");
>         File file = new File("/tmp/test.xmi");
>         OutputStream outputStream = new FileOutputStream(file);
>         XmiCasSerializer.serialize(jCas.getCas(), outputStream);
>         InputStream inputStream = new FileInputStream(file);
>         XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
>     }
> }
> {code}
> And here is the stacktrace:
> {code}
> [Fatal Error] :1:350: Character reference "&#56490" is an invalid XML 
> character.
> Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; 
> columnNumber: 350; Character reference "&#56490" is an invalid XML character.
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at 
> org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1955)
>       at 
> org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1872)
>       at Test.main(Test.java:24)
>      [java] Java Result: 1
> {code}
> Please tell me if you need more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to