Hello Pei, Since my usage is purely local for now, I created an external cTAKES copy from SVN. Any customizations/changes I package and deploy are to my local maven repo. I then use these local/custom dependencies in my 'Pipelines' project.
Thanks, Rob On 10/1/13 2:48 PM, "Pei Chen" <chen...@apache.org> wrote: >Rob, >Are you pulling the existing ctakes dependencies from maven central. Or >did you have recreate ctakes modules in a local repo of some sort? >It would be good to make ctakes flexible enough to do what you described >(hence seperating out modules and resources into it's own modules). >--Pei > > >On Tue, Oct 1, 2013 at 2:06 PM, Robert Spurrier < >robert.spurr...@explorys.com> wrote: > >> It's been a while, but just to update in case anyone is watching this: >> >> My goal was to create a project full of annotators (both cTAKES and >> home-grown), and "cherry-pick" from them at will to create smaller >> pipelines that could be launched on a hadoop grid via MapReduce. >> >> My final setup consisted of two Maven aggregator projects, Annotators >>and >> Pipelines. >> >> Annotators is an aggregator project containing all of the annotators and >> their resources. I am essentially following the cTAKES layout for this >> one. One annotator, one module. >> E.g.: >> Annotators >> -ctakes-core-annotator >> Pom.xml >> -ctakes-pos-tagger-annotator >> Pom.xml >> -custom-annotator-one >> Pom.xml >> ParentPom.xml >> >> >> Pipelines is another aggregator project containing the source code to >> generate the pipelines, and the job files that utilize the pipelines on >> the hadoop grid (effectively serving as the input reader & CAS >>consumer). >> Each pipeline is its own Maven module, and spits outs a .jar that >>contains >> all of the classes I need to run a UIMA-MapReduce job for that specific >> pipeline. It also creates a resource archive (model files, etc) that I >> ship off to the Hadoop DistributedCache. >> E.g.: >> Pipelines >> -custom-base-pipeline >> Pom.xml >> -observation-pipeline >> Pom.xml >> ParentPom.xml >> >> >> >> Notes: >> -I modified the cTAKES pom to put all of the descriptors into each >> individual annotator jar as well as the classes, just so they can >> conveniently be called by name.The "heavier" resources are put on the >> DistributedCache. >> >> -I create individual pipeline distributions in the Pipelines project by >> using Maven Reactor Plugin at the parent project level. E.g. "maven >> package -pl custom-base-pipeline -am" . This builds >>custom-base-pipeline >> with all of its dependencies, and all of the necessary resource >> >> -Each pipeline has it's own Maven assembly to specify what should be >> included with that pipeline's distribution and resources >> >> >> The point of this was to maximize modularity, pipeline flexibility, >> runtime speed, and to keep my pipeline jars as lightweight as possible. >> Though it has many awesome features, I did not want to run every part of >> cTAKES every time. >> >> >> Cheers, >> Rob >> >> >> >> >> >> >> >> >> >> >> On 9/9/13 11:23 AM, "Robert Spurrier" <robert.spurr...@explorys.com> >> wrote: >> >> >Actually after poking around in Maven documentation I think I have just >> >figured out an approach I like. >> > >> >For each pipeline I wish to create, I will generate a Maven assembly >> >descriptor. I will put each assembly file in the cTAKES root pom.xml. >> >Hopefully this will create each pipeline for me when I run 'package'. >>This >> >approach will still tie in nicely with the project object >>model/lifecycle >> >of cTAKES, and generate all my custom jars as well. >> > >> >I will try it out and update this thread with the results >> > >> >Thanks, >> >Rob >> > >> > >> >On 9/9/13 10:38 AM, "Chen, Pei" <pei.c...@childrens.harvard.edu> wrote: >> > >> >>Hi Robert, >> >> >> >>Are you planning to a process to build everything from source? >> >>Or were you planning to have a build process that combines the >>ctakes-*** >> >>jars with your custom application jars? >> >> >> >>--Pei >> >> >> >>> -----Original Message----- >> >>> From: Robert Spurrier [mailto:robert.spurr...@explorys.com] >> >>> Sent: Monday, September 09, 2013 9:27 AM >> >>> To: dev@ctakes.apache.org >> >>> Subject: Creating Runnable .JARs From A Subset of cTAKES Maven >>Modules >> >>> >> >>> Good Morning! >> >>> >> >>> I am trying to use cTAKES tools on a distributed computing >>platform. I >> >>>would >> >>> rather not ship the entire compiled cTAKES package (~1.5 Gb) out to >>the >> >>> shared cache when I only need a few annotators and their resources >>at a >> >>> time. >> >>> >> >>> I should first mention that I am not very familiar with Maven. I >> >>>recently >> >>> upgraded cTAKES from v 2.5.0, where I was configuring smaller >>pipelines >> >>> using ant build files. This process was cumbersome however, and I >>can >> >>> appreciate the new modular Maven project layout. I just do not know >> >>>how >> >>> to effectively utilize it in a way that is flexible. >> >>> >> >>> Does anyone have any advice on how I can package subsets of cTAKES >> >>> annotator modules and their dependencies/resources, so I can create >> >>> 'thinner' custom pipelines that are geared towards specific tasks? >> >>> >> >>> For example, I might ultimately want a pipeline .JAR that contains >>the >> >>>tools to >> >>> RegEx Left Ventricular Ejection Fraction measurements from free >>text. >> >>>In >> >>> such a .JAR I would not need any of the dictionary resources or >> >>>negation >> >>> annotators, so they could be excluded. >> >>> >> >>> It looks like I could create Maven assembly plugin descriptors to >> >>>generate >> >>> these custom .JARs, but I would like to see if anyone here has any >> >>> advice/caveats before I pursue this route. >> >>> >> >>> >> >>> Thanks, >> >>> Robert Spurrier >> > >> > >> >> >>