RE: adding the relation extractor aggregate to the regression test

Masanz, James J. Wed, 20 Mar 2013 07:27:00 -0700

Perhaps you or Pei can weigh in on the feasibility of having the UMLS resource 
downloaded from somewhere else for these regression tests that get run 
automatically.


I was guessing we would have a separate set of tests that would be run when a 
RC was built that would test the pipelines that include the UMLS resources.  
But that the regression tests in ctakes-regression-test would at least ensure 
the other (non UMLS) parts of the pipelines worked.

-- James

> -----Original Message-----
> From: ctakes-dev-return-1387-Masanz.James=mayo....@incubator.apache.org
> [mailto:ctakes-dev-return-1387-Masanz.James=mayo....@incubator.apache.org]
> On Behalf Of Steven Bethard
> Sent: Wednesday, March 20, 2013 7:37 AM
> To: [email protected]
> Subject: Re: adding the relation extractor aggregate to the regression
> test
> 
> On Mar 19, 2013, at 10:59 AM, "Masanz, James J." <[email protected]>
> wrote:
> > For #2, the other consideration is that unlike the models, the UMLS
> > resources are not available under the ALv2, so they are not going to
> > end up in the ASF repo even if we decide on option 2 for the
> > "[DISCUSS] Where should cTAKES models live?" thread
> >
> > Because the UMLS resources will not end up in the ASF repo, for these
> > regression test CPEs, instead of using
> > DictionaryLookupAnnotatorUMLS.xml I think we are going to have to use
> > DictionaryLookupAnnotator.xml
> >
> > Here are few of the terms I remember offhand that are in the 'toy'
> > dictionary used by
> > ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotato
> > r.xml
> 
> I guess my concern here is that the whole point of running a regression
> test is to make sure that the descriptor that we ship actually works. If
> we're not testing the real descriptor we expect people to use, then
> perhaps we shouldn't include that real descriptor at all?
> 
> Or did you envision some other process that would also test the real
> descriptor?
> 
> Steve
> 
> >
> >  knee
> >  pain
> >  aspirin
> >
> > -- James
> >
> >> -----Original Message-----
> >> From:
> >> ctakes-dev-return-1381-Masanz.James=mayo....@incubator.apache.org
> >> [mailto:[email protected]
> >> .org]
> >> On Behalf Of Chen, Pei
> >> Sent: Saturday, March 16, 2013 8:52 AM
> >> To: <[email protected]>
> >> Cc: [email protected]
> >> Subject: Re: adding the relation extractor aggregate to the
> >> regression test
> >>
> >> The intended behavior of the regression test is to verify that new
> >> code didn't break existing functionality, so yes the xml output
> >> should be the same from previous runs. If there are expected changes,
> >> they should just be manually verified and rerecorded.  This should
> >> supplement any unit tests but not replace it. It's a 20000ft test
> >> that a pipeline still works as expected and not really intended to
> replace specific logically tests.
> >> It's a starting point- we can certainly add more or improve it.  Both
> >> in terms of adding more unit tests as well as regression.
> >>
> >> 2). Yes.  We'll need to add UMLS resources if they are to be tested.
> >> Open to ideas and volunteers as I didn't get to that point yet :)
> >>
> >> Sent from my iPhone
> >>
> >> On Mar 16, 2013, at 8:48 AM, "Steven Bethard"
> >> <[email protected]> wrote:
> >>
> >>> On Mar 15, 2013, at 9:26 PM, "Pei Chen (JIRA)" <[email protected]>
> wrote:
> >>>> If you have spare time, do you want to also try adding the relation
> >> extractor aggregate to the regression test?  And having this
> >> (pipeline as well as the xml desc configuration) automatically tested
> in the future?
> >>>> It should be as simple as adding a CPE to the directory.
> >>>>
> >>>> /ctakes-regression-test/desc/collection_processing_engine/
> >>>> Take a look at
> >> http://svn.apache.org/repos/asf/incubator/ctakes/trunk/ctakes-regress
> >> ion- test/desc/collection_processing_engine/CoreferenceCPETest.xml
> >>>> For example:
> >>>> 1)    Just clone and point to the CPE to ../../../ctakes-relation-
> >> extractor/desc/analysis_engine/RelationExtractorAggregate.xml instead .
> >>>> 2)    Run mvn test once (it should probably fail because there is
> >> nothing to compare with, but just collect the generated results).
> >>>> 3)    Copy the results from generatedoutput/{NameofCPEFilename}/ into
> >> expectedoutput/{NameofCPEFilename}
> >>>> 4)    Check the expectedoutput into SVN.
> >>>> 5)    Now Every time mvn test is run, that CPE will executed and
> >> results compared automatically.
> >>>
> >>> First, a general comment about the regression test, and then some
> >> details about where I'm currently stuck.
> >>>
> >>> (1) Is it really a good idea to be asserting that the XML files
> >> generated by cTAKES components should always be identical?
> >> Particularly if the current components make some mistakes, shouldn't
> >> we only be asserting the things that they get right? Something more
> >> along the lines of
> >> org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotatorsTes
> >> t, where we have individual assertions for each thing the relation
> >> extractor should have found?
> >>>
> >>> (2) In trying to add the CPETest, I got stuck trying to get ctakes-
> >> dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.
> >> xml to work. (This descriptor is referenced by ctakes-relation-
> >> extractor/desc/analysis_engine/RelationExtractorPreprocessor.xml.)
> >> Here's the error I'm getting:
> >>>
> >>> org.apache.uima.resource.ResourceInitializationException:
> >>> Initialization
> >> of CAS Processor with name "RelationExtractorCPETest" failed.
> >>>   at
> >> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> >> alize
> >> (CollectionProcessingEngine_impl.java:83)
> >>>   ...
> >>> Caused by: org.apache.uima.resource.ResourceConfigurationException:
> >> Initialization of CAS Processor with name "RelationExtractorCPETest"
> >> failed.
> >>>   at
> >> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> >> rated
> >> CasProcessor(CPEFactory.java:1104)
> >>>   ...
> >>> Caused by: org.apache.uima.resource.ResourceInitializationException
> >>>   at
> >> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(Lu
> >> ceneI
> >> ndexReaderResourceImpl.java:80)
> >>>   ...
> >>> Caused by: java.io.FileNotFoundException:
> >> org/apache/ctakes/dictionary/lookup/rxnorm_index
> >>>   at
> >> org.apache.ctakes.core.resource.FileLocator.locateExplicitly(FileLoca
> >> tor.j
> >> ava:69)
> >>>   at
> >> org.apache.ctakes.core.resource.FileLocator.locateFile(FileLocator.ja
> >> va:44
> >> )
> >>>   at
> >> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(Lu
> >> ceneI
> >> ndexReaderResourceImpl.java:58)
> >>>   ... 53 more
> >>>
> >>> I assume this is because the UMLS indexes aren't in SVN anymore.
> >>> What's
> >> the proper way to reference these now, and should
> >> DictionaryLookupAnnotatorUMLS.xml be updated appropriately?
> >>>
> >>> Thanks,
> >>>
> >>> Steve

RE: adding the relation extractor aggregate to the regression test

Reply via email to