FWIW, the method that we use for doing large batches is to create a
pipeline descriptor using uimafit, create a reader such as
FilesInDirectoryCollectionReader or UriCollectionReader, and then use a
JCasIterable to wrap a for-loop around every document. This lets you
collect statistics or data structures from every document, say, and
then do something with them at the end.

// create engine and reader
...

// loop over documents:
for(JCas jcas : new JCasIterable(readerDescription,
aggregate.createAggregateDescription()){
 // handle jcas for one document
}

// any other code to finish up
...



Alternatively, if you can easily handle everything within the pipeline
you can just use SimplePipeline:

SimplePipeline.runPipeline(
        collectionReader,
        aggregateBuilder.createAggregate(),
        xWriter);


Tim

On Sun, 2017-01-22 at 14:36 +0000, Arron Lacey wrote:
> Thanks very much Sean. Didn't work unfortunately - but I am curious
> if 
> you don't personally use the CPE, how to you batch process documents?
> 
> I would like to just run the AggregatePlaintextFastUMLSProcessor.xml
> on 
> all files in a given directory - perhaps with *some* control over
> the 
> output filenames.
> 
> Thanks,
> 
> Arron.
> 
> On Fri, 20 Jan, 2017 at 4:38 PM, Finan, Sean 
> <sean.fi...@childrens.harvard.edu> wrote:
> > 
> > Hi Arron Lacey,
> > 
> > That particular cas consumer java class is a uimafit-paradigm 
> > implementation, and from my memory the CPE gui does not play well 
> > with Uimafit.  I could be wrong - I never use the cpe anymore.
> > 
> > You might be able to get things working by changing line #23 in
> > the 
> > .xml file from
> >   
> > <implementationName>org.apache.ctakes.core.cc.XmiWriterCasConsumerC
> > takes</implementationName>
> > 
> > To  
> > <implementationName>org.apache.uima.tools.components.XmiWriterCasCo
> > nsumer</implementationName>
> > 
> > As far as I know the ctakes version is the same as the uima
> > version 
> > but with better output file naming and a uimafit framing.
> > 
> > Again, I'm not certain that the problem is cpe : uimafit 
> > incompatibility.  If somebody else out there knows better then
> > please 
> > speak up.
> > 
> > Good luck,
> > Sean
> > 
> > -----Original Message-----
> > From: Arron Lacey [mailto:a.s.la...@swansea.ac.uk]
> > Sent: Friday, January 20, 2017 11:13 AM
> > To: dev@ctakes.apache.org
> > Subject: Cannot load XMIWriterCasConsumer.xml with CPE.sh
> > 
> > Hi - I am trying to use the CPI to output results using the CAS
> > Consumer: __XmiWriterCasConsumer.xml
> > 
> > 
> > but here is the error message I am getting:
> > 
> > > 
> > >  org.apache.uima.resource.ResourceInitializationException
> > >  CausedBy:
> > > org.apache.uima.resource.ResourceConfigurationException
> > >  CausedBy: java.lang.Exception: The component XMI Writer CAS
> > > Consumer
> > >  cannot be created (Thread name: Thread-4)
> > My setup is using:
> > 
> > Collection Reader
> > > 
> > >  
> > > desc/ctakes-
> > > core/desc/collection_reader/FilesInDirectoryCollectionReader.xml
> > Analysis Engine
> > > 
> > >  
> > > desc/ctakes-clinical-
> > > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor
> > > .xml
> > CAS Consumer
> > > 
> > >  desc/ctakes-core/desc/cas_consumer/__XmiWriterCasConsumer.xml
> > I can get the normal XML writer to work, so I would like to ask
> > what I
> > need to do to my pipeline to use the XMI Writer?
> > 
> > Thanks very much,
> > 
> > Arron Lacey.

Reply via email to