[ 
https://issues.apache.org/jira/browse/UIMA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662704#comment-14662704
 ] 

Petr Baudis commented on UIMA-3818:
-----------------------------------

I experienced this issue with 2.6.0 too. It seems this is just bad interaction 
with "rogue" versions of the XML libraries brought into the classpath by 
Stanford NLP.  Disabling Xerces was not enough for me, in the DKpro+gradle 
context I had to do

  
compile("de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl:$dkproVersion")
 {
    exclude group: "com.io7m.xom", module: "xom"  // this dependency breaks 
utf8 XMI serialization, c.f. UIMA-3818
  }

(I suspect the culprit is xml-apis or something, but I didn't investigate 
further and this fixes the issue for me).

> Unsuported XML entity by XmiCas(De)serializer
> ---------------------------------------------
>
>                 Key: UIMA-3818
>                 URL: https://issues.apache.org/jira/browse/UIMA-3818
>             Project: UIMA
>          Issue Type: Bug
>          Components: Collection Processing
>    Affects Versions: 2.4.2SDK
>            Reporter: Gregoire Jadi
>             Fix For: 2.6.0SDK
>
>
> The UTF8 character '𝒪' can not be deserialized by 
> `XmiCasDeserializer.deserialize'.
> Here is a way to reproduce this:
> {code:java}
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.InputStream;
> import java.io.OutputStream;
> import org.apache.uima.cas.impl.XmiCasDeserializer;
> import org.apache.uima.cas.impl.XmiCasSerializer;
> import org.apache.uima.fit.factory.JCasFactory;
> import org.apache.uima.jcas.JCas;
> public class Test {
>     public static void main(String[] args) throws Exception {
>         JCas jCas = JCasFactory.createJCas();
>         jCas.setDocumentText("𝒪");
>         File file = new File("/tmp/test.xmi");
>         OutputStream outputStream = new FileOutputStream(file);
>         XmiCasSerializer.serialize(jCas.getCas(), outputStream);
>         InputStream inputStream = new FileInputStream(file);
>         XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
>     }
> }
> {code}
> And here is the stacktrace:
> {code}
> [Fatal Error] :1:350: Character reference "&#56490" is an invalid XML 
> character.
> Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; 
> columnNumber: 350; Character reference "&#56490" is an invalid XML character.
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at 
> org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1955)
>       at 
> org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1872)
>       at Test.main(Test.java:24)
>      [java] Java Result: 1
> {code}
> Please tell me if you need more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to