Gregoire Jadi created UIMA-3818:
-----------------------------------
Summary: Unsuported XML entities by XmiCas(De)serializer
Key: UIMA-3818
URL: https://issues.apache.org/jira/browse/UIMA-3818
Project: UIMA
Issue Type: Bug
Components: Collection Processing
Affects Versions: 2.4.2SDK
Reporter: Gregoire Jadi
The UTF8 character '𝒪' can not be deserialized by
`XmiCasDeserializer.deserialize'.
Here is a way to reproduce this:
{code:java}
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.uima.cas.impl.XmiCasDeserializer;
import org.apache.uima.cas.impl.XmiCasSerializer;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.jcas.JCas;
public class Test {
public static void main(String[] args) throws Exception {
JCas jCas = JCasFactory.createJCas();
jCas.setDocumentText("𝒪");
File file = new File("/tmp/test.xmi");
OutputStream outputStream = new FileOutputStream(file);
XmiCasSerializer.serialize(jCas.getCas(), outputStream);
InputStream inputStream = new FileInputStream(file);
XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
}
}
{code}
Please tell me if you need more information.
--
This message was sent by Atlassian JIRA
(v6.2#6252)