Good Morning!

I am trying to use cTAKES tools on a distributed computing platform. I would 
rather not ship the entire compiled cTAKES package (~1.5 Gb) out to the shared 
cache when I only need a few annotators and their resources at a time.

I should first mention that I am not very familiar with Maven. I recently 
upgraded cTAKES from v 2.5.0, where I was configuring smaller pipelines using 
ant build files. This process was cumbersome however, and I can appreciate the 
new modular Maven project layout.  I just do not know how to effectively 
utilize it in a way that is flexible.

Does anyone have any advice on how I can package subsets of cTAKES annotator 
modules and their dependencies/resources, so  I can create 'thinner' custom 
pipelines that are geared towards specific tasks?

For example, I might ultimately want a pipeline .JAR that contains the tools to 
RegEx Left Ventricular Ejection Fraction measurements from free text. In such a 
.JAR I would not need any of the dictionary resources or negation annotators, 
so they could be excluded.

It looks like I could create Maven assembly plugin descriptors to generate 
these custom .JARs, but I would like to see if anyone here has any 
advice/caveats before I pursue this route.


Thanks,
Robert Spurrier

Reply via email to