On 10/23/2013 8:36 AM, Peter Klügl wrote: > Is it correct that the type system may not change if the analysis engine > implementation extends JCasAnnotator_ImplBase? I somehow miss the method > typeSystemInit(). Hmm, should I really switch to CasAnnotator_ImplBase, > or do I have missed something?
I think the type system is "equal" for these 2 CASes, but not "==", since the "failing" case recreates a new CAS from the identical metadata. UIMA is designed with the lifecycle: 1) assemble / configure pipeline, including merging type systems; 2) use the internal Java objects that were created in (1) to process multiple work-items, typically by reusing CASes (via the reset()) or by getting new CASes from the AnalysisEngine representing the top level of the pipeline using "analysisEngine.newJCas()" or analysisEngine.newCas(). This produces new CASes where the type system impl objects are == (identical). Approaches which produce type system objects which are equal but not == should be discouraged. You could probably easily detect when a user passes a CAS where the type system is not ==, and redo your internal setups... -Marshall > Peter > > On 23.10.2013 14:35, Peter Klügl (JIRA) wrote: >> [ >> https://issues.apache.org/jira/browse/UIMA-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802850#comment-13802850 >> ] >> >> Peter Klügl commented on UIMA-3357: >> ----------------------------------- >> >> Thanks for reporting this. I added a test for now. >> >> The problem is that the type system has changed, at least its representation >> in java, but nobody told the analysis engine about it. On the one hand, the >> environment of the script stores the known types. This is initiated by >> {{initializeTypes()}} either if the analysis engine was not initialized yet >> or if the analysis engine is forced to update itself with each process call >> (parameter reloadScript). On the other hand, the internal "indexing" (begin >> and end map in RutaBasic) uses the current CAS, its annotations and their >> types. So we have different type objects that cause problems. >> >> >>> CONTAINS fails when running script as AE in a pipeline with a new CAS >>> --------------------------------------------------------------------- >>> >>> Key: UIMA-3357 >>> URL: https://issues.apache.org/jira/browse/UIMA-3357 >>> Project: UIMA >>> Issue Type: Bug >>> Components: ruta, uimaFIT >>> Affects Versions: 2.0.1ruta, 2.1.0ruta >>> Reporter: Daniel Maeurer >>> Assignee: Peter Klügl >>> Priority: Minor >>> >>> When running my Ruta script as an analysis engine in a pipeline, it does >>> not work correctly when creating a new CAS and processing the pipeline a >>> second time with the new CAS. >>> While reusing the old cas with "cas.reset()" is working, creating a new CAS >>> results in failing rules including "CONTAINS" in the ruta script. >>> The ruta script used in the example: >>> {code:title=mystic.ruta|borderStyle=solid} >>> PACKAGE de.tudarmstadt.algo.vpino.ruta; >>> DECLARE test; >>> Document{CONTAINS(CW)->MARK(test)}; >>> {code} >>> The following Java class can reproduce the error. It creates four xmi >>> files. The last xmi file is missing the annotations created with rules >>> including "CONTAINS". >>> {code:title=MysticPipe.java|borderStyle=solid} >>> package org.uimafit.pipeline; >>> import java.io.File; >>> import java.io.FileOutputStream; >>> import java.io.IOException; >>> import java.io.OutputStream; >>> import java.util.ArrayList; >>> import java.util.List; >>> import org.apache.uima.UIMAFramework; >>> import org.apache.uima.analysis_engine.AnalysisEngine; >>> import org.apache.uima.analysis_engine.AnalysisEngineDescription; >>> import org.apache.uima.analysis_engine.AnalysisEngineProcessException; >>> import org.apache.uima.cas.CAS; >>> import org.apache.uima.cas.impl.XmiCasSerializer; >>> import org.apache.uima.fit.factory.AnalysisEngineFactory; >>> import org.apache.uima.fit.pipeline.SimplePipeline; >>> import org.apache.uima.resource.ResourceInitializationException; >>> import org.apache.uima.resource.metadata.ResourceMetaData; >>> import org.apache.uima.util.CasCreationUtils; >>> import org.apache.uima.util.InvalidXMLException; >>> import org.apache.uima.util.XMLInputSource; >>> import org.apache.uima.util.XMLSerializer; >>> import org.xml.sax.SAXException; >>> public class MysticPipe { >>> public static void main(String[] args) throws Exception { >>> working("This is a test.", initPipeline()); >>> failing("This is a test.", initPipeline()); >>> } >>> private static AnalysisEngine initPipeline() throws >>> ResourceInitializationException, IOException, InvalidXMLException { >>> File specFile = new >>> File("./descriptor/de/tudarmstadt/algo/vpino/ruta/mysticEngine.xml"); >>> XMLInputSource in = new XMLInputSource(specFile); >>> AnalysisEngineDescription ruta = (AnalysisEngineDescription) >>> UIMAFramework.getXMLParser().parseResourceSpecifier(in); >>> return AnalysisEngineFactory.createEngine(ruta); >>> } >>> private static void working(String input, AnalysisEngine theEngine) >>> throws ResourceInitializationException, AnalysisEngineProcessException, >>> IOException, >>> SAXException { >>> final List<ResourceMetaData> metaData = new >>> ArrayList<ResourceMetaData>(); >>> metaData.add(theEngine.getMetaData()); >>> final CAS cas = CasCreationUtils.createCas(metaData); >>> System.out.println("create a new cas..."); >>> cas.setDocumentLanguage("de"); >>> cas.setDocumentText(input); >>> SimplePipeline.runPipeline(cas, theEngine); >>> writeXmiFile(cas, "works_test1");//CHECK >>> //THE DIFFERENCE >>> cas.reset(); >>> //END DIFFERENCE >>> System.out.println("create a new cas..."); >>> cas.setDocumentLanguage("de"); >>> cas.setDocumentText(input); >>> SimplePipeline.runPipeline(cas, theEngine); >>> writeXmiFile(cas, "works_test2");//CHECK >>> } >>> private static void failing(String input, AnalysisEngine theEngine) >>> throws ResourceInitializationException, AnalysisEngineProcessException, >>> IOException, >>> SAXException { >>> final List<ResourceMetaData> metaData = new >>> ArrayList<ResourceMetaData>(); >>> metaData.add(theEngine.getMetaData()); >>> final CAS cas = CasCreationUtils.createCas(metaData); >>> System.out.println("create a new cas..."); >>> cas.setDocumentLanguage("de"); >>> cas.setDocumentText(input); >>> SimplePipeline.runPipeline(cas, theEngine); >>> writeXmiFile(cas, "works_test3"); // CHECK >>> //THE DIFFERENCE >>> final CAS cas2 = CasCreationUtils.createCas(metaData); >>> //END DIFFERENCE >>> System.out.println("create a new cas..."); >>> cas2.setDocumentLanguage("de"); >>> cas2.setDocumentText(input); >>> SimplePipeline.runPipeline(cas2, theEngine); >>> writeXmiFile(cas2, "fail_test4"); //FAIL >>> return; >>> } >>> >>> public static void writeXmiFile(CAS aCas, String Fname) throws >>> IOException, SAXException { >>> File outFile = new File("output", Fname + ".xmi"); >>> OutputStream out = null; >>> try { >>> // out = new StringOutputStream(); >>> out = new FileOutputStream(outFile); >>> XmiCasSerializer ser = new >>> XmiCasSerializer(aCas.getTypeSystem()); >>> XMLSerializer xmlSer = new XMLSerializer(out, false); >>> ser.serialize(aCas, xmlSer.getContentHandler()); >>> } finally { >>> if (out != null) { >>> out.close(); >>> } >>> } >>> } >>> } >>> {code} >> >> -- >> This message was sent by Atlassian JIRA >> (v6.1#6144) >> >