[
https://issues.apache.org/jira/browse/UIMA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997599#comment-13997599
]
Gregoire Jadi commented on UIMA-3818:
-------------------------------------
In my current project, I use Stanford CoreNLP
(edu.stanford.nlp:stanford-corenlp:3.3.1) but for some reasons still unknown to
me. I had to exclude the Xalan dependency (xalan:xalan) otherwise I had the
same error as before (i.e. wrong serialization).
{code:xml}
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.3.1</version>
<exclusions>
<exclusion>
<groupId>xalan</groupId>
<artifactId>xalan</artifactId>
</exclusion>
</exclusions>
</dependency>
{code}
(And since I am also using OpenNLP UIMA I had to exclude uimaj-tool from
OpenNLP's dependencies, but that's understandable).
> Unsuported XML entity by XmiCas(De)serializer
> ---------------------------------------------
>
> Key: UIMA-3818
> URL: https://issues.apache.org/jira/browse/UIMA-3818
> Project: UIMA
> Issue Type: Bug
> Components: Collection Processing
> Affects Versions: 2.4.2SDK
> Reporter: Gregoire Jadi
>
> The UTF8 character '𝒪' can not be deserialized by
> `XmiCasDeserializer.deserialize'.
> Here is a way to reproduce this:
> {code:java}
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.InputStream;
> import java.io.OutputStream;
> import org.apache.uima.cas.impl.XmiCasDeserializer;
> import org.apache.uima.cas.impl.XmiCasSerializer;
> import org.apache.uima.fit.factory.JCasFactory;
> import org.apache.uima.jcas.JCas;
> public class Test {
> public static void main(String[] args) throws Exception {
> JCas jCas = JCasFactory.createJCas();
> jCas.setDocumentText("𝒪");
> File file = new File("/tmp/test.xmi");
> OutputStream outputStream = new FileOutputStream(file);
> XmiCasSerializer.serialize(jCas.getCas(), outputStream);
> InputStream inputStream = new FileInputStream(file);
> XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
> }
> }
> {code}
> And here is the stacktrace:
> {code}
> [Fatal Error] :1:350: Character reference "�" is an invalid XML
> character.
> Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1;
> columnNumber: 350; Character reference "�" is an invalid XML character.
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at
> org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1955)
> at
> org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1872)
> at Test.main(Test.java:24)
> [java] Java Result: 1
> {code}
> Please tell me if you need more information.
--
This message was sent by Atlassian JIRA
(v6.2#6252)