Rob, Are you pulling the existing ctakes dependencies from maven central. Or did you have recreate ctakes modules in a local repo of some sort? It would be good to make ctakes flexible enough to do what you described (hence seperating out modules and resources into it's own modules). --Pei
On Tue, Oct 1, 2013 at 2:06 PM, Robert Spurrier < robert.spurr...@explorys.com> wrote: > It's been a while, but just to update in case anyone is watching this: > > My goal was to create a project full of annotators (both cTAKES and > home-grown), and "cherry-pick" from them at will to create smaller > pipelines that could be launched on a hadoop grid via MapReduce. > > My final setup consisted of two Maven aggregator projects, Annotators and > Pipelines. > > Annotators is an aggregator project containing all of the annotators and > their resources. I am essentially following the cTAKES layout for this > one. One annotator, one module. > E.g.: > Annotators > -ctakes-core-annotator > Pom.xml > -ctakes-pos-tagger-annotator > Pom.xml > -custom-annotator-one > Pom.xml > ParentPom.xml > > > Pipelines is another aggregator project containing the source code to > generate the pipelines, and the job files that utilize the pipelines on > the hadoop grid (effectively serving as the input reader & CAS consumer). > Each pipeline is its own Maven module, and spits outs a .jar that contains > all of the classes I need to run a UIMA-MapReduce job for that specific > pipeline. It also creates a resource archive (model files, etc) that I > ship off to the Hadoop DistributedCache. > E.g.: > Pipelines > -custom-base-pipeline > Pom.xml > -observation-pipeline > Pom.xml > ParentPom.xml > > > > Notes: > -I modified the cTAKES pom to put all of the descriptors into each > individual annotator jar as well as the classes, just so they can > conveniently be called by name.The "heavier" resources are put on the > DistributedCache. > > -I create individual pipeline distributions in the Pipelines project by > using Maven Reactor Plugin at the parent project level. E.g. "maven > package -pl custom-base-pipeline -am" . This builds custom-base-pipeline > with all of its dependencies, and all of the necessary resource > > -Each pipeline has it's own Maven assembly to specify what should be > included with that pipeline's distribution and resources > > > The point of this was to maximize modularity, pipeline flexibility, > runtime speed, and to keep my pipeline jars as lightweight as possible. > Though it has many awesome features, I did not want to run every part of > cTAKES every time. > > > Cheers, > Rob > > > > > > > > > > > On 9/9/13 11:23 AM, "Robert Spurrier" <robert.spurr...@explorys.com> > wrote: > > >Actually after poking around in Maven documentation I think I have just > >figured out an approach I like. > > > >For each pipeline I wish to create, I will generate a Maven assembly > >descriptor. I will put each assembly file in the cTAKES root pom.xml. > >Hopefully this will create each pipeline for me when I run 'package'. This > >approach will still tie in nicely with the project object model/lifecycle > >of cTAKES, and generate all my custom jars as well. > > > >I will try it out and update this thread with the results > > > >Thanks, > >Rob > > > > > >On 9/9/13 10:38 AM, "Chen, Pei" <pei.c...@childrens.harvard.edu> wrote: > > > >>Hi Robert, > >> > >>Are you planning to a process to build everything from source? > >>Or were you planning to have a build process that combines the ctakes-*** > >>jars with your custom application jars? > >> > >>--Pei > >> > >>> -----Original Message----- > >>> From: Robert Spurrier [mailto:robert.spurr...@explorys.com] > >>> Sent: Monday, September 09, 2013 9:27 AM > >>> To: dev@ctakes.apache.org > >>> Subject: Creating Runnable .JARs From A Subset of cTAKES Maven Modules > >>> > >>> Good Morning! > >>> > >>> I am trying to use cTAKES tools on a distributed computing platform. I > >>>would > >>> rather not ship the entire compiled cTAKES package (~1.5 Gb) out to the > >>> shared cache when I only need a few annotators and their resources at a > >>> time. > >>> > >>> I should first mention that I am not very familiar with Maven. I > >>>recently > >>> upgraded cTAKES from v 2.5.0, where I was configuring smaller pipelines > >>> using ant build files. This process was cumbersome however, and I can > >>> appreciate the new modular Maven project layout. I just do not know > >>>how > >>> to effectively utilize it in a way that is flexible. > >>> > >>> Does anyone have any advice on how I can package subsets of cTAKES > >>> annotator modules and their dependencies/resources, so I can create > >>> 'thinner' custom pipelines that are geared towards specific tasks? > >>> > >>> For example, I might ultimately want a pipeline .JAR that contains the > >>>tools to > >>> RegEx Left Ventricular Ejection Fraction measurements from free text. > >>>In > >>> such a .JAR I would not need any of the dictionary resources or > >>>negation > >>> annotators, so they could be excluded. > >>> > >>> It looks like I could create Maven assembly plugin descriptors to > >>>generate > >>> these custom .JARs, but I would like to see if anyone here has any > >>> advice/caveats before I pursue this route. > >>> > >>> > >>> Thanks, > >>> Robert Spurrier > > > > > > >