Good Morning! I am trying to use cTAKES tools on a distributed computing platform. I would rather not ship the entire compiled cTAKES package (~1.5 Gb) out to the shared cache when I only need a few annotators and their resources at a time.
I should first mention that I am not very familiar with Maven. I recently upgraded cTAKES from v 2.5.0, where I was configuring smaller pipelines using ant build files. This process was cumbersome however, and I can appreciate the new modular Maven project layout. I just do not know how to effectively utilize it in a way that is flexible. Does anyone have any advice on how I can package subsets of cTAKES annotator modules and their dependencies/resources, so I can create 'thinner' custom pipelines that are geared towards specific tasks? For example, I might ultimately want a pipeline .JAR that contains the tools to RegEx Left Ventricular Ejection Fraction measurements from free text. In such a .JAR I would not need any of the dictionary resources or negation annotators, so they could be excluded. It looks like I could create Maven assembly plugin descriptors to generate these custom .JARs, but I would like to see if anyone here has any advice/caveats before I pursue this route. Thanks, Robert Spurrier