Gregoire Jadi created UIMA-3818:
-----------------------------------

             Summary: Unsuported XML entities by XmiCas(De)serializer
                 Key: UIMA-3818
                 URL: https://issues.apache.org/jira/browse/UIMA-3818
             Project: UIMA
          Issue Type: Bug
          Components: Collection Processing
    Affects Versions: 2.4.2SDK
            Reporter: Gregoire Jadi


The UTF8 character '𝒪' can not be deserialized by 
`XmiCasDeserializer.deserialize'.

Here is a way to reproduce this:

{code:java}
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.uima.cas.impl.XmiCasDeserializer;
import org.apache.uima.cas.impl.XmiCasSerializer;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.jcas.JCas;

public class Test {
    public static void main(String[] args) throws Exception {
        JCas jCas = JCasFactory.createJCas();
        jCas.setDocumentText("𝒪");
        File file = new File("/tmp/test.xmi");
        OutputStream outputStream = new FileOutputStream(file);
        XmiCasSerializer.serialize(jCas.getCas(), outputStream);

        InputStream inputStream = new FileInputStream(file);
        XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
    }
}
{code}

Please tell me if you need more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to