Hi,

a collegue once tried that with the attached code. Can obviously not recreate the hierarchy but for some easy cases its fine.

Best regards

Markus Krug



Am 15.05.2017 um 19:02 schrieb José Tomás Atria:
I think Marshall is right, in the sense that you could recover "a" type
system that would allow for the observed features found in a serialized
cas, but that resulting type system would be pretty useless, as it would
have none of the type and hierarchy constraints of the original.

But also, it would probably be full of errors, for example, since AFAIK,
there's no way to distinguish whether an int found in the value for a
feature is an actual int value, or an xmi:id reference to another
annotation in the xmi (e.g. <cas:Annotation someIntValue="1675",
someOtherAnnotation="1765"> offers no way of distinguishing between the
literal int 1675 value for the "someInt" feature and the xmi:id reference
value from the "someOtherAnnotation" feature).

in brief: this sounds like trying to infer a valid schema from a set of
conforming instances, which i remember to be a futile effort.

On Mon, May 15, 2017 at 9:59 AM Marshall Schor <[email protected]> wrote:

Hi,

The xmi file would contain just a set of "examples" of the type system,
right?

And there would be nothing there that would indicate the type hierarchy, I
think, although one might be able to heuristically guess at a possible
hierarchy, if there were instances of types that were members of various
levels
of the hierarchy, for instance if there was a type

foo  with features foof1 (e.g. string) and foof2 (ref to type bar)

superfoo with features just being foof1 (of same type as foof1 in foo)

Then you might be able to conclude a guess about the hierarchy...


You might mean, instead, to come up with some type system that would "fit"
the
types in the xmi, with no need to have those be the actual type system.
Even
that, it may be difficult, because an xmi instance doesn't describe the
data
type of its feature value, and the encoding of the feature value is
ambiguous
with respect to the type system, I think. For instance, a feature
reference is
encoded as an integer value.

-Marshall


On 5/12/2017 12:32 PM, Richard Eckart de Castilho wrote:
Hi all,

do we have code somewhere that tries to reverse-engineer
a type system description given an XMI file?

Cheers,

-- Richard

--
sent from a phone. please excuse terseness and tpyos.

enviado desde un teléfono. por favor disculpe la parquedad y los erroers.


package de.uniwue.mk.kall.athen.part.editor.util;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.apache.uima.resource.ResourceInitializationException;
import org.apache.uima.resource.metadata.TypeDescription;
import org.apache.uima.resource.metadata.TypeSystemDescription;
import org.apache.uima.resource.metadata.impl.TypeSystemDescription_impl;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

/**
 * 
 * @author pdbeck
 *
 */

public class XmiToTypeSystemGenerator {

        private static Map<String, String> namespaces = new HashMap<String, 
String>();

        private static Document document;

        private static TypeSystemDescription tsd;

        private static HashMap<String, Set<String>> types;

        // public static void main(String[] args) throws
        // ParserConfigurationException, SAXException, IOException,
        // ResourceInitializationException {
        // File srcFile = new File(PATH_TO_XMI_FILE);
        // File dstFile = new File(PATH_TO_DST_TSD_FILE);
        // generateTypeSystemFromXmi(srcFile, dstFile);
        // }

        public static TypeSystemDescription generateTypeSystemFromXmi(File 
srcFile) throws ParserConfigurationException,
                        SAXException, IOException, 
ResourceInitializationException {
                init(srcFile);
                resolveNamespaces();
                createTypes();

                return writeTypesToTsd();
        }

        private static TypeSystemDescription writeTypesToTsd() throws 
FileNotFoundException, SAXException, IOException {
                for (String type : types.keySet()) {
                        Set<String> featureSet = types.get(type);
                        String typeName = getNamespace(type);
                        TypeDescription addType = tsd.addType(typeName, "", 
"uima.tcas.Annotation");
                        for (String feature : featureSet) {
                                if (feature.startsWith("simple:")) {
                                        
addType.addFeature(feature.split(":")[1], "", "uima.cas.String");
                                } else {
                                        
addType.addFeature(feature.split(":")[1], "", "uima.cas.StringList");
                                }
                        }

                }
                return tsd;
        }

        private static String getNamespace(String type) {
                String[] split = type.split("\\:");
                return namespaces.get(split[0]) + "." + split[1];
        }

        private static void init(File srcFile) throws 
ParserConfigurationException, SAXException, IOException,
                        ResourceInitializationException {
                DocumentBuilderFactory factory = 
DocumentBuilderFactory.newInstance();
                DocumentBuilder builder = factory.newDocumentBuilder();
                document = builder.parse(srcFile);
                types = new HashMap<String, Set<String>>();
                tsd = new TypeSystemDescription_impl();
                // XmiToTypeSystemGenerator.dstFile = dstFile;
        }

        private static void createTypes() {
                NodeList childNodes = document.getFirstChild().getChildNodes();
                for (int i = 0; i < childNodes.getLength(); i++) {
                        Node item = childNodes.item(i);
                        String nodeName = item.getNodeName();
                        if (!nodeName.startsWith("cas:")) {
                                Set<String> newFeatureSet = getFeatureSet(item);
                                Set<String> oldFeatureSet = new 
HashSet<String>();
                                if (types.containsKey(nodeName)) {
                                        
oldFeatureSet.addAll(types.get(nodeName));
                                }
                                newFeatureSet.addAll(oldFeatureSet);
                                types.put(nodeName, newFeatureSet);
                        }
                }
        }

        private static Set<String> getFeatureSet(Node item) {
                Set<String> featureSet = new HashSet<String>();
                NodeList childNodes = item.getChildNodes();
                for (int i = 0; i < childNodes.getLength(); i++) {
                        Node child = childNodes.item(i);
                        featureSet.add("complex:" + child.getNodeName());
                }
                NamedNodeMap attributes = item.getAttributes();
                for (int i = 0; i < attributes.getLength(); i++) {
                        Node child = attributes.item(i);
                        String nodeName = child.getNodeName();
                        if (!nodeName.equals("begin") && 
!nodeName.equals("end") && !nodeName.equals("sofa")
                                        && !nodeName.equals("xmi:id")) {
                                featureSet.add("simple:" + nodeName);
                        }
                }
                return featureSet;
        }

        private static void resolveNamespaces() {
                NamedNodeMap attributes = 
document.getFirstChild().getAttributes();
                for (int i = 0; i < attributes.getLength(); i++) {
                        Node attributeItem = attributes.item(i);
                        String nodeName = attributeItem.getNodeName();
                        if (nodeName.startsWith("xmlns")) {
                                String[] split = nodeName.split(":");
                                String nodeValue = attributeItem.getNodeValue();
                                String pre = nodeValue.replaceAll("http:///";, 
"");
                                pre = pre.replaceAll(".ecore", "");
                                pre = pre.replaceAll("/", ".");
                                String namespace = split[1];
                                namespaces.put(namespace, pre);
                        }
                }
        }

}

Reply via email to