FWIW, the method that we use for doing large batches is to create a pipeline descriptor using uimafit, create a reader such as FilesInDirectoryCollectionReader or UriCollectionReader, and then use a JCasIterable to wrap a for-loop around every document. This lets you collect statistics or data structures from every document, say, and then do something with them at the end.
// create engine and reader ... // loop over documents: for(JCas jcas : new JCasIterable(readerDescription, aggregate.createAggregateDescription()){ // handle jcas for one document } // any other code to finish up ... Alternatively, if you can easily handle everything within the pipeline you can just use SimplePipeline: SimplePipeline.runPipeline( collectionReader, aggregateBuilder.createAggregate(), xWriter); Tim On Sun, 2017-01-22 at 14:36 +0000, Arron Lacey wrote: > Thanks very much Sean. Didn't work unfortunately - but I am curious > if > you don't personally use the CPE, how to you batch process documents? > > I would like to just run the AggregatePlaintextFastUMLSProcessor.xml > on > all files in a given directory - perhaps with *some* control over > the > output filenames. > > Thanks, > > Arron. > > On Fri, 20 Jan, 2017 at 4:38 PM, Finan, Sean > <sean.fi...@childrens.harvard.edu> wrote: > > > > Hi Arron Lacey, > > > > That particular cas consumer java class is a uimafit-paradigm > > implementation, and from my memory the CPE gui does not play well > > with Uimafit. I could be wrong - I never use the cpe anymore. > > > > You might be able to get things working by changing line #23 in > > the > > .xml file from > > > > <implementationName>org.apache.ctakes.core.cc.XmiWriterCasConsumerC > > takes</implementationName> > > > > To > > <implementationName>org.apache.uima.tools.components.XmiWriterCasCo > > nsumer</implementationName> > > > > As far as I know the ctakes version is the same as the uima > > version > > but with better output file naming and a uimafit framing. > > > > Again, I'm not certain that the problem is cpe : uimafit > > incompatibility. If somebody else out there knows better then > > please > > speak up. > > > > Good luck, > > Sean > > > > -----Original Message----- > > From: Arron Lacey [mailto:a.s.la...@swansea.ac.uk] > > Sent: Friday, January 20, 2017 11:13 AM > > To: dev@ctakes.apache.org > > Subject: Cannot load XMIWriterCasConsumer.xml with CPE.sh > > > > Hi - I am trying to use the CPI to output results using the CAS > > Consumer: __XmiWriterCasConsumer.xml > > > > > > but here is the error message I am getting: > > > > > > > > org.apache.uima.resource.ResourceInitializationException > > > CausedBy: > > > org.apache.uima.resource.ResourceConfigurationException > > > CausedBy: java.lang.Exception: The component XMI Writer CAS > > > Consumer > > > cannot be created (Thread name: Thread-4) > > My setup is using: > > > > Collection Reader > > > > > > > > > desc/ctakes- > > > core/desc/collection_reader/FilesInDirectoryCollectionReader.xml > > Analysis Engine > > > > > > > > > desc/ctakes-clinical- > > > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor > > > .xml > > CAS Consumer > > > > > > desc/ctakes-core/desc/cas_consumer/__XmiWriterCasConsumer.xml > > I can get the normal XML writer to work, so I would like to ask > > what I > > need to do to my pipeline to use the XMI Writer? > > > > Thanks very much, > > > > Arron Lacey.