Perhaps we can start without the umls resources first. (I was not very comfortable in the last release because of the limited test coverage. Hence, I hope this will be a starting point)
I think it is def possible to auto download umls, unpack, tester provide credentials, run the tests. None of this gets distributed so I think it's possible. I could take a stab at this in a few weeks unless someone gives it a shot first. Sent from my iPhone On Mar 20, 2013, at 10:27 AM, "Masanz, James J." <[email protected]> wrote: > Perhaps you or Pei can weigh in on the feasibility of having the UMLS > resource downloaded from somewhere else for these regression tests that get > run automatically. > > I was guessing we would have a separate set of tests that would be run when a > RC was built that would test the pipelines that include the UMLS resources. > But that the regression tests in ctakes-regression-test would at least ensure > the other (non UMLS) parts of the pipelines worked. > > -- James > >> -----Original Message----- >> From: ctakes-dev-return-1387-Masanz.James=mayo....@incubator.apache.org >> [mailto:ctakes-dev-return-1387-Masanz.James=mayo....@incubator.apache.org] >> On Behalf Of Steven Bethard >> Sent: Wednesday, March 20, 2013 7:37 AM >> To: [email protected] >> Subject: Re: adding the relation extractor aggregate to the regression >> test >> >> On Mar 19, 2013, at 10:59 AM, "Masanz, James J." <[email protected]> >> wrote: >>> For #2, the other consideration is that unlike the models, the UMLS >>> resources are not available under the ALv2, so they are not going to >>> end up in the ASF repo even if we decide on option 2 for the >>> "[DISCUSS] Where should cTAKES models live?" thread >>> >>> Because the UMLS resources will not end up in the ASF repo, for these >>> regression test CPEs, instead of using >>> DictionaryLookupAnnotatorUMLS.xml I think we are going to have to use >>> DictionaryLookupAnnotator.xml >>> >>> Here are few of the terms I remember offhand that are in the 'toy' >>> dictionary used by >>> ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotato >>> r.xml >> >> I guess my concern here is that the whole point of running a regression >> test is to make sure that the descriptor that we ship actually works. If >> we're not testing the real descriptor we expect people to use, then >> perhaps we shouldn't include that real descriptor at all? >> >> Or did you envision some other process that would also test the real >> descriptor? >> >> Steve >> >>> >>> knee >>> pain >>> aspirin >>> >>> -- James >>> >>>> -----Original Message----- >>>> From: >>>> ctakes-dev-return-1381-Masanz.James=mayo....@incubator.apache.org >>>> [mailto:[email protected] >>>> .org] >>>> On Behalf Of Chen, Pei >>>> Sent: Saturday, March 16, 2013 8:52 AM >>>> To: <[email protected]> >>>> Cc: [email protected] >>>> Subject: Re: adding the relation extractor aggregate to the >>>> regression test >>>> >>>> The intended behavior of the regression test is to verify that new >>>> code didn't break existing functionality, so yes the xml output >>>> should be the same from previous runs. If there are expected changes, >>>> they should just be manually verified and rerecorded. This should >>>> supplement any unit tests but not replace it. It's a 20000ft test >>>> that a pipeline still works as expected and not really intended to >> replace specific logically tests. >>>> It's a starting point- we can certainly add more or improve it. Both >>>> in terms of adding more unit tests as well as regression. >>>> >>>> 2). Yes. We'll need to add UMLS resources if they are to be tested. >>>> Open to ideas and volunteers as I didn't get to that point yet :) >>>> >>>> Sent from my iPhone >>>> >>>> On Mar 16, 2013, at 8:48 AM, "Steven Bethard" >>>> <[email protected]> wrote: >>>> >>>>> On Mar 15, 2013, at 9:26 PM, "Pei Chen (JIRA)" <[email protected]> >> wrote: >>>>>> If you have spare time, do you want to also try adding the relation >>>> extractor aggregate to the regression test? And having this >>>> (pipeline as well as the xml desc configuration) automatically tested >> in the future? >>>>>> It should be as simple as adding a CPE to the directory. >>>>>> >>>>>> /ctakes-regression-test/desc/collection_processing_engine/ >>>>>> Take a look at >>>> http://svn.apache.org/repos/asf/incubator/ctakes/trunk/ctakes-regress >>>> ion- test/desc/collection_processing_engine/CoreferenceCPETest.xml >>>>>> For example: >>>>>> 1) Just clone and point to the CPE to ../../../ctakes-relation- >>>> extractor/desc/analysis_engine/RelationExtractorAggregate.xml instead . >>>>>> 2) Run mvn test once (it should probably fail because there is >>>> nothing to compare with, but just collect the generated results). >>>>>> 3) Copy the results from generatedoutput/{NameofCPEFilename}/ into >>>> expectedoutput/{NameofCPEFilename} >>>>>> 4) Check the expectedoutput into SVN. >>>>>> 5) Now Every time mvn test is run, that CPE will executed and >>>> results compared automatically. >>>>> >>>>> First, a general comment about the regression test, and then some >>>> details about where I'm currently stuck. >>>>> >>>>> (1) Is it really a good idea to be asserting that the XML files >>>> generated by cTAKES components should always be identical? >>>> Particularly if the current components make some mistakes, shouldn't >>>> we only be asserting the things that they get right? Something more >>>> along the lines of >>>> org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotatorsTes >>>> t, where we have individual assertions for each thing the relation >>>> extractor should have found? >>>>> >>>>> (2) In trying to add the CPETest, I got stuck trying to get ctakes- >>>> dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS. >>>> xml to work. (This descriptor is referenced by ctakes-relation- >>>> extractor/desc/analysis_engine/RelationExtractorPreprocessor.xml.) >>>> Here's the error I'm getting: >>>>> >>>>> org.apache.uima.resource.ResourceInitializationException: >>>>> Initialization >>>> of CAS Processor with name "RelationExtractorCPETest" failed. >>>>> at >>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi >>>> alize >>>> (CollectionProcessingEngine_impl.java:83) >>>>> ... >>>>> Caused by: org.apache.uima.resource.ResourceConfigurationException: >>>> Initialization of CAS Processor with name "RelationExtractorCPETest" >>>> failed. >>>>> at >>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg >>>> rated >>>> CasProcessor(CPEFactory.java:1104) >>>>> ... >>>>> Caused by: org.apache.uima.resource.ResourceInitializationException >>>>> at >>>> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(Lu >>>> ceneI >>>> ndexReaderResourceImpl.java:80) >>>>> ... >>>>> Caused by: java.io.FileNotFoundException: >>>> org/apache/ctakes/dictionary/lookup/rxnorm_index >>>>> at >>>> org.apache.ctakes.core.resource.FileLocator.locateExplicitly(FileLoca >>>> tor.j >>>> ava:69) >>>>> at >>>> org.apache.ctakes.core.resource.FileLocator.locateFile(FileLocator.ja >>>> va:44 >>>> ) >>>>> at >>>> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(Lu >>>> ceneI >>>> ndexReaderResourceImpl.java:58) >>>>> ... 53 more >>>>> >>>>> I assume this is because the UMLS indexes aren't in SVN anymore. >>>>> What's >>>> the proper way to reference these now, and should >>>> DictionaryLookupAnnotatorUMLS.xml be updated appropriately? >>>>> >>>>> Thanks, >>>>> >>>>> Steve >
