Hi Will,

Retraining the relation extractor should be fairly easy. The instructions I am about to give you apply if you are using cTAKES 3.0. However, if you are planning to use the trunk version, my instructions may no longer be accurate. Relation extraction has undergone some changes recently in connection with cTAKES-190 issue and I don't fully understand these most recent changes yet (but I am working on it).

1. Run PreprocessAndWriteXmi in the eval package, specifying the location of the text of the notes, the location of the gold standard relation annotations, and the output directory. This class will run all the preprocessing that is required for relation extraction and add gold standard relation annotations to the CAS. The resulting CASes will be saved to disk as XMI files.

2. Run RelationExtractorEvaluation, passing it the location of the XMI files obtained in the previous steps and --grid-search option. This class will use the annotations in the XMI files to find the optimal training parameters using grid search and n-fold cross-validation. After the execution completes, record the best set of parameters found by the grid search. If you don't have a lot of time, this step can be skipped (you can just use the default SVM parameters).

3. Update the model parameters in the main() method of RelationExtractorTrain (pipelines package) to the values found by the grid search. Run RelationExtractorTrain, specifying the location of the XMI files. This class will (a) create a model that is necessary for deployment of the relation module, and (b) create the descriptor files which will ensure that the the relation AEs can be used as a part of a UIMA pipeline.

If you are planning to annotate your data, it might be easier to use Knowtator since we already have a gold standard reader for Knowtator. If you want to use a different annotation tool, you just have to make sure you add the manual annotations to the gold view of the XMI files. The relation extractor reads the gold standard annotations from the gold view.

Hope this helps,

Dima


On 08/29/2013 06:07 PM, William Karl Thompson wrote:
Hello all,

I'm interested in training the relation extractor on some annotated notes from 
Northwestern clinical data, and I understand that cleartk is currently being 
used for this purpose in the cTAKES project.  Could someone provide some 
pointers on how to go about using cleartk to train models that can then be 
invoked by a cTAKES module? Again, my focus for now is on the relation 
extractor. In case it's relevant, I'm intending to use the brat rapid 
annotation tool (http://brat.nlplab.org/) to generate a gold standard corpus.

Cheers,

Will


--
Dmitriy Dligach, PhD
Research Fellow
Children's Hospital Informatics Program
Boston Children's Hospital and Harvard Medical School
(617) 919-3596

Reply via email to