Re: resources for training modules

Dmitriy Dligach Fri, 30 Aug 2013 06:47:54 -0700

Hi Will,

Retraining the relation extractor should be fairly easy. Theinstructions I am about to give you apply if you are using cTAKES 3.0.However, if you are planning to use the trunk version, my instructionsmay no longer be accurate. Relation extraction has undergone somechanges recently in connection with cTAKES-190 issue and I don't fullyunderstand these most recent changes yet (but I am working on it).

1. Run PreprocessAndWriteXmi in the eval package, specifying thelocation of the text of the notes, the location of the gold standardrelationannotations, and the output directory. This class will run all thepreprocessing that is required for relation extraction and add gold standardrelation annotations to the CAS. The resulting CASes will be saved todisk as XMI files.

2. Run RelationExtractorEvaluation, passing it the location of the XMIfiles obtained in the previous steps and --grid-search option. Thisclass will use the annotations in the XMI files to find the optimaltraining parameters using grid search and n-fold cross-validation. Afterthe execution completes, record the best set of parameters found by thegrid search. If you don't have a lot of time, this step can be skipped(you can just use the default SVM parameters).

3. Update the model parameters in the main() method ofRelationExtractorTrain (pipelines package) to the values found by thegrid search. Run RelationExtractorTrain, specifying the location of theXMI files. This class will (a) create a model that is necessary fordeployment of the relation module, and (b) create the descriptor fileswhich will ensure that the the relation AEs can be used as a part of aUIMA pipeline.

If you are planning to annotate your data, it might be easier to useKnowtator since we already have a gold standard reader for Knowtator. Ifyou want to use a different annotation tool, you just have to make sureyou add the manual annotations to the gold view of the XMI files. Therelation extractor reads the gold standard annotations from the gold view.


Hope this helps,

Dima


On 08/29/2013 06:07 PM, William Karl Thompson wrote:

Hello all,

I'm interested in training the relation extractor on some annotated notes from 
Northwestern clinical data, and I understand that cleartk is currently being 
used for this purpose in the cTAKES project.  Could someone provide some 
pointers on how to go about using cleartk to train models that can then be 
invoked by a cTAKES module? Again, my focus for now is on the relation 
extractor. In case it's relevant, I'm intending to use the brat rapid 
annotation tool (http://brat.nlplab.org/) to generate a gold standard corpus.

Cheers,

Will


--
Dmitriy Dligach, PhD
Research Fellow
Children's Hospital Informatics Program
Boston Children's Hospital and Harvard Medical School
(617) 919-3596

Re: resources for training modules

Reply via email to