> Sean,
Thank you for the detailed reply. As you mentioned, I had to revert the capital letters from your Outlook, and also, if somebody else wants to use the code and cannot get it to run: the getFilesInDir method needs to return the populated Collection<File> fileList, the variable final File[] fileList and its usage should be renamed to something else (as the variable name already exists) and the main method needs to throw an IOException. I think these were all the changes I made so that the txt files from a folder are added to the collection, many thanks again. What I am looking to do is also what the description in "ExampleAggregatePipeline" says, "running a pipeline programatically w/o uima xml descriptor xml files". This is accomplished by what I understand the uimaFIT classes, so that AEs can be defined in Java, added to a Pipeline and directly run. The uimaFIT page gives a nice Java snippet that uses uimaFIT in a similar way as the cTAKES example, I pasted the few Java lines below at [1]. http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.introduction I would like to use cTAKES in my own Java programs such that, just like the ExampleAggregatePipeline, uimaFIT can be used create and run a cTAKES pipeline to annotate medical texts. Then, I could also output the result in CAS files, just like the CVD GUI is doing. This would allow to directly be able to add or modify my own AnalysisEngines. Essentially, I want to know how to set up the cTAKES objects correctly into a pipeline in a Java programs, so that medical texts are annotated, like the GUI is doing. I would really appreciate any hints or how to accomplish this. Following your code example to read the files the outlined idea is: for ( File file : files ) { Final String note = getTextInFile( file ); JCas jCas = JCasFactory.createJCas(); jCas.setDocumentText(note); // 1. create the AnalysisEngines for tokenizer, tagger and other cTAKES components etc. to annotate medical texts // 2. runPipeline(jCas, ...); } [1] The code snippet from uimaFIT: JCas jCas = JCasFactory.createJCas(); jCas.setDocumentText("some text"); AnalysisEngine tokenizer = createEngine(MyTokenizer.class); AnalysisEngine tagger = createEngine(MyTagger.class); runPipeline(jCas, tokenizer, tagger); for(Token token : iterate(jCas, Token.class)){ System.out.println(token.getTag()); } Tol O. Finan, Sean <Sean.Finan@...> writes: > > Hi Tol (and Maite), > > I'm not entirely certain that I understand the question, but here is an attempt to help. If I'm > oversimplifying then I apologize. > > I think that ExampleAggregatePipeline is intended to represent a very simple single-note pipeline and > that custom code could be produced by using it as an example. > > If you want to process texts in a directory, you can find with a web search plenty of ways to list files in a > directory and read text from files. org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader > might be what you used in the CPE, and you can certainly peruse the code and take what you need. Or, if you > decide to write a simple diy, here is one possibility: > > Static public Collection<File> getFilesInDir( final File directory ) { > final Collection<File> fileList = new ArrayList<>(); > final File[] fileList = directory.listFiles(); > if ( fileList == null ) { > System.err.println( "please check the directory " + directory.getAbsolutePath() ); > System.exit( 1 ); > } > for ( final File file : directory.listFiles() ) { > if ( file.canRead() ) { > fileList.add( file ); > } > } > } > > Static public String getTextInFile( final File file ) throws IOException { -- or handle ioE herein > final Path nioPath = file.toPath(); > return new String( Files.readAllBytes( nioPath ) ); > } > > Static public void main( String ... args ) { > If ( args[0].isEmpty() ) { > System.out.println( "Enter a directory path" ); > System.exit( 0 ); > } > Final Collection<File> files = getFilesInDir( new File( args[0] ); > For ( File file : files ) { > Final String note = getTextInFile( file ); > --- Insert here code a' la ExampleAggregatePipeline --- > --- swap out the writer in ExampleAggregatePipeline with CasIOUtil method (below) --- > } > } > > I must admit that I have never directly used it, but there is an xmi file writing method in > org.apache.uima.fit.util.CasIOUtil named writeXmi( JCas jCas, File file ). You could give this a try > and see if it produces the type of output that you want. The same utility class has a writeXCas(..) method. > > If the above has absolutely nothing to do with your needs then please send me a bulleted list of items, > example workflow, etc. and I'll see if I can be of service. > > Oh, and I wrote the above code freehand, so MS Outlook is adding capital letters, etc. If you cut and paste > you'll need to change that - plus I haven't run/compiled, so there might be a typo or missed exception or > something. Or it may not work (in which case I'll throw in a little more effort). > > Sean > > -----Original Message----- > From: Tol O. [mailto:toltox@...] > Sent: Monday, February 02, 2015 6:56 PM > To: dev@... > Subject: Re: Question about the pipeline > > Maite Meseure Hugues <meseure.maite <at> ...> writes: > > > > > Hello all, > > > > Thank you for your preceding answers. > > I have a few questions regarding the pipeline example to run cTakes > > programmatically. > > I am running ExampleAggregatePipeline.java with > > ExampleHelloWorldAnnotator but I would like to know how I can change > > it to run my data, as the CPE where we can choose the directory of our data. > > My second question is about the xml output generated with the CPE, can > > I get the same xml output in using the example pipeline? and How? > > Thanks for your time. > > I would like to ask the same question. After successfully setting up CTAKES following the Developers Guide > I would also like to use a modified ExampleAggregatePipeline to output a CAS file identical to the output > obtained by the CPE or the CVD when following the Users Guide. > > This would be a great help for developers as a starting class to be able to programmatically obtain an > annotated file based on a plaintext or XML input, same as through the two GUIs. > > Right now I am reading through the Component Use Guide to replicate the CPE or the CVD tutorial with the test > input, but it is a bit overwhelming. > > Any pointers or suggestions would be really appreciated. > > Tol O. > >
