Re: Creating Runnable .JARs From A Subset of cTAKES Maven Modules

Robert Spurrier Tue, 01 Oct 2013 11:56:04 -0700

Hello Pei,

Since my usage is purely local for now, I created an external cTAKES copy
from SVN. Any customizations/changes I package and deploy are to my local
maven repo. I then use these local/custom dependencies in my 'Pipelines'
project.


Thanks,
Rob

On 10/1/13 2:48 PM, "Pei Chen" <[email protected]> wrote:

>Rob,
>Are you pulling the existing ctakes dependencies from maven central.  Or
>did you have recreate ctakes modules in a local repo of some sort?
>It would be good to make ctakes flexible enough to do what you described
>(hence seperating out modules and resources into it's own modules).
>--Pei
>
>
>On Tue, Oct 1, 2013 at 2:06 PM, Robert Spurrier <
>[email protected]> wrote:
>
>> It's been a while, but just to update in case anyone is watching this:
>>
>> My goal was to create a project full of annotators (both cTAKES and
>> home-grown), and "cherry-pick" from them at will to create smaller
>> pipelines that could be launched on a hadoop grid via MapReduce.
>>
>> My final setup consisted of two Maven aggregator projects, Annotators
>>and
>> Pipelines.
>>
>> Annotators is an aggregator project containing all of the annotators and
>> their resources.  I am essentially following the cTAKES layout for this
>> one. One annotator, one module.
>> E.g.:
>> Annotators
>>         -ctakes-core-annotator
>>                 Pom.xml
>>         -ctakes-pos-tagger-annotator
>>                 Pom.xml
>>         -custom-annotator-one
>>                 Pom.xml
>> ParentPom.xml
>>
>>
>> Pipelines is another aggregator project containing the source code to
>> generate the pipelines, and the job files that utilize the pipelines on
>> the hadoop grid (effectively serving as the input reader & CAS
>>consumer).
>> Each pipeline is its own Maven module, and spits outs a .jar that
>>contains
>> all of the classes I need to run a UIMA-MapReduce job for that specific
>> pipeline. It also creates a resource archive (model files, etc) that I
>> ship off to the Hadoop DistributedCache.
>> E.g.:
>> Pipelines
>>         -custom-base-pipeline
>>                 Pom.xml
>>         -observation-pipeline
>>                 Pom.xml
>> ParentPom.xml
>>
>>
>>
>> Notes:
>> -I modified the cTAKES pom to put all of the descriptors into each
>> individual annotator jar as well as the classes, just so they can
>> conveniently be called by name.The "heavier" resources are put on the
>> DistributedCache.
>>
>> -I create individual pipeline distributions in the Pipelines project by
>> using Maven Reactor Plugin at the parent project level. E.g. "maven
>> package -pl custom-base-pipeline  -am" . This builds
>>custom-base-pipeline
>> with all of its dependencies, and all of the necessary resource
>>
>> -Each pipeline has it's own Maven assembly to specify what should be
>> included with that pipeline's distribution and resources
>>
>>
>> The point of this was to maximize modularity, pipeline flexibility,
>> runtime speed, and to keep my pipeline jars as lightweight as possible.
>> Though it has many awesome features, I did not want to run every part of
>> cTAKES every time.
>>
>>
>> Cheers,
>> Rob
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 9/9/13 11:23 AM, "Robert Spurrier" <[email protected]>
>> wrote:
>>
>> >Actually after poking around in Maven documentation I think I have just
>> >figured out an approach I like.
>> >
>> >For each pipeline I wish to create, I will generate a Maven assembly
>> >descriptor. I will put each assembly file in the cTAKES root pom.xml.
>> >Hopefully this will create each pipeline for me when I run 'package'.
>>This
>> >approach will still tie in nicely with the project object
>>model/lifecycle
>> >of cTAKES, and generate all my custom jars as well.
>> >
>> >I will try it out and update this thread with the results
>> >
>> >Thanks,
>> >Rob
>> >
>> >
>> >On 9/9/13 10:38 AM, "Chen, Pei" <[email protected]> wrote:
>> >
>> >>Hi Robert,
>> >>
>> >>Are you planning to a process to build everything from source?
>> >>Or were you planning to have a build process that combines the
>>ctakes-***
>> >>jars with your custom application jars?
>> >>
>> >>--Pei
>> >>
>> >>> -----Original Message-----
>> >>> From: Robert Spurrier [mailto:[email protected]]
>> >>> Sent: Monday, September 09, 2013 9:27 AM
>> >>> To: [email protected]
>> >>> Subject: Creating Runnable .JARs From A Subset of cTAKES Maven
>>Modules
>> >>>
>> >>> Good Morning!
>> >>>
>> >>> I am trying to use cTAKES tools on a distributed computing
>>platform. I
>> >>>would
>> >>> rather not ship the entire compiled cTAKES package (~1.5 Gb) out to
>>the
>> >>> shared cache when I only need a few annotators and their resources
>>at a
>> >>> time.
>> >>>
>> >>> I should first mention that I am not very familiar with Maven. I
>> >>>recently
>> >>> upgraded cTAKES from v 2.5.0, where I was configuring smaller
>>pipelines
>> >>> using ant build files. This process was cumbersome however, and I
>>can
>> >>> appreciate the new modular Maven project layout.  I just do not know
>> >>>how
>> >>> to effectively utilize it in a way that is flexible.
>> >>>
>> >>> Does anyone have any advice on how I can package subsets of cTAKES
>> >>> annotator modules and their dependencies/resources, so  I can create
>> >>> 'thinner' custom pipelines that are geared towards specific tasks?
>> >>>
>> >>> For example, I might ultimately want a pipeline .JAR that contains
>>the
>> >>>tools to
>> >>> RegEx Left Ventricular Ejection Fraction measurements from free
>>text.
>> >>>In
>> >>> such a .JAR I would not need any of the dictionary resources or
>> >>>negation
>> >>> annotators, so they could be excluded.
>> >>>
>> >>> It looks like I could create Maven assembly plugin descriptors to
>> >>>generate
>> >>> these custom .JARs, but I would like to see if anyone here has any
>> >>> advice/caveats before I pursue this route.
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Robert Spurrier
>> >
>> >
>>
>>
>>

Re: Creating Runnable .JARs From A Subset of cTAKES Maven Modules

Reply via email to