Perhaps you or Pei can weigh in on the feasibility of having the UMLS resource downloaded from somewhere else for these regression tests that get run automatically.
I was guessing we would have a separate set of tests that would be run when a RC was built that would test the pipelines that include the UMLS resources. But that the regression tests in ctakes-regression-test would at least ensure the other (non UMLS) parts of the pipelines worked. -- James > -----Original Message----- > From: ctakes-dev-return-1387-Masanz.James=mayo....@incubator.apache.org > [mailto:ctakes-dev-return-1387-Masanz.James=mayo....@incubator.apache.org] > On Behalf Of Steven Bethard > Sent: Wednesday, March 20, 2013 7:37 AM > To: [email protected] > Subject: Re: adding the relation extractor aggregate to the regression > test > > On Mar 19, 2013, at 10:59 AM, "Masanz, James J." <[email protected]> > wrote: > > For #2, the other consideration is that unlike the models, the UMLS > > resources are not available under the ALv2, so they are not going to > > end up in the ASF repo even if we decide on option 2 for the > > "[DISCUSS] Where should cTAKES models live?" thread > > > > Because the UMLS resources will not end up in the ASF repo, for these > > regression test CPEs, instead of using > > DictionaryLookupAnnotatorUMLS.xml I think we are going to have to use > > DictionaryLookupAnnotator.xml > > > > Here are few of the terms I remember offhand that are in the 'toy' > > dictionary used by > > ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotato > > r.xml > > I guess my concern here is that the whole point of running a regression > test is to make sure that the descriptor that we ship actually works. If > we're not testing the real descriptor we expect people to use, then > perhaps we shouldn't include that real descriptor at all? > > Or did you envision some other process that would also test the real > descriptor? > > Steve > > > > > knee > > pain > > aspirin > > > > -- James > > > >> -----Original Message----- > >> From: > >> ctakes-dev-return-1381-Masanz.James=mayo....@incubator.apache.org > >> [mailto:[email protected] > >> .org] > >> On Behalf Of Chen, Pei > >> Sent: Saturday, March 16, 2013 8:52 AM > >> To: <[email protected]> > >> Cc: [email protected] > >> Subject: Re: adding the relation extractor aggregate to the > >> regression test > >> > >> The intended behavior of the regression test is to verify that new > >> code didn't break existing functionality, so yes the xml output > >> should be the same from previous runs. If there are expected changes, > >> they should just be manually verified and rerecorded. This should > >> supplement any unit tests but not replace it. It's a 20000ft test > >> that a pipeline still works as expected and not really intended to > replace specific logically tests. > >> It's a starting point- we can certainly add more or improve it. Both > >> in terms of adding more unit tests as well as regression. > >> > >> 2). Yes. We'll need to add UMLS resources if they are to be tested. > >> Open to ideas and volunteers as I didn't get to that point yet :) > >> > >> Sent from my iPhone > >> > >> On Mar 16, 2013, at 8:48 AM, "Steven Bethard" > >> <[email protected]> wrote: > >> > >>> On Mar 15, 2013, at 9:26 PM, "Pei Chen (JIRA)" <[email protected]> > wrote: > >>>> If you have spare time, do you want to also try adding the relation > >> extractor aggregate to the regression test? And having this > >> (pipeline as well as the xml desc configuration) automatically tested > in the future? > >>>> It should be as simple as adding a CPE to the directory. > >>>> > >>>> /ctakes-regression-test/desc/collection_processing_engine/ > >>>> Take a look at > >> http://svn.apache.org/repos/asf/incubator/ctakes/trunk/ctakes-regress > >> ion- test/desc/collection_processing_engine/CoreferenceCPETest.xml > >>>> For example: > >>>> 1) Just clone and point to the CPE to ../../../ctakes-relation- > >> extractor/desc/analysis_engine/RelationExtractorAggregate.xml instead . > >>>> 2) Run mvn test once (it should probably fail because there is > >> nothing to compare with, but just collect the generated results). > >>>> 3) Copy the results from generatedoutput/{NameofCPEFilename}/ into > >> expectedoutput/{NameofCPEFilename} > >>>> 4) Check the expectedoutput into SVN. > >>>> 5) Now Every time mvn test is run, that CPE will executed and > >> results compared automatically. > >>> > >>> First, a general comment about the regression test, and then some > >> details about where I'm currently stuck. > >>> > >>> (1) Is it really a good idea to be asserting that the XML files > >> generated by cTAKES components should always be identical? > >> Particularly if the current components make some mistakes, shouldn't > >> we only be asserting the things that they get right? Something more > >> along the lines of > >> org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotatorsTes > >> t, where we have individual assertions for each thing the relation > >> extractor should have found? > >>> > >>> (2) In trying to add the CPETest, I got stuck trying to get ctakes- > >> dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS. > >> xml to work. (This descriptor is referenced by ctakes-relation- > >> extractor/desc/analysis_engine/RelationExtractorPreprocessor.xml.) > >> Here's the error I'm getting: > >>> > >>> org.apache.uima.resource.ResourceInitializationException: > >>> Initialization > >> of CAS Processor with name "RelationExtractorCPETest" failed. > >>> at > >> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi > >> alize > >> (CollectionProcessingEngine_impl.java:83) > >>> ... > >>> Caused by: org.apache.uima.resource.ResourceConfigurationException: > >> Initialization of CAS Processor with name "RelationExtractorCPETest" > >> failed. > >>> at > >> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg > >> rated > >> CasProcessor(CPEFactory.java:1104) > >>> ... > >>> Caused by: org.apache.uima.resource.ResourceInitializationException > >>> at > >> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(Lu > >> ceneI > >> ndexReaderResourceImpl.java:80) > >>> ... > >>> Caused by: java.io.FileNotFoundException: > >> org/apache/ctakes/dictionary/lookup/rxnorm_index > >>> at > >> org.apache.ctakes.core.resource.FileLocator.locateExplicitly(FileLoca > >> tor.j > >> ava:69) > >>> at > >> org.apache.ctakes.core.resource.FileLocator.locateFile(FileLocator.ja > >> va:44 > >> ) > >>> at > >> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(Lu > >> ceneI > >> ndexReaderResourceImpl.java:58) > >>> ... 53 more > >>> > >>> I assume this is because the UMLS indexes aren't in SVN anymore. > >>> What's > >> the proper way to reference these now, and should > >> DictionaryLookupAnnotatorUMLS.xml be updated appropriately? > >>> > >>> Thanks, > >>> > >>> Steve
