RE: Question about the pipeline

Finan, Sean Mon, 02 Feb 2015 17:13:39 -0800

Hi Tol (and Maite),

I'm not entirely certain that I understand the question, but here is an attempt 
to help.  If I'm oversimplifying then I apologize.

I think that ExampleAggregatePipeline is intended to represent a very simple 
single-note pipeline and that custom code could be produced by using it as an 
example.

If you want to process texts in a directory, you can find with a web search 
plenty of ways to list files in a directory and read text from files.  
org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader might be what you 
used in the CPE, and you can certainly peruse the code and take what you need.  
Or, if you decide to write a simple diy,  here is one possibility:

Static public Collection<File> getFilesInDir( final File directory ) {
   final Collection<File> fileList = new ArrayList<>();
   final File[] fileList = directory.listFiles();
   if ( fileList == null ) {
      System.err.println( "please check the directory " + 
directory.getAbsolutePath() );
      System.exit( 1 );
   }
    for ( final File file : directory.listFiles() ) {
        if ( file.canRead() ) {
            fileList.add( file );
        }
    }
} 

Static public String getTextInFile( final File file ) throws IOException {   -- 
or handle ioE herein
   final Path nioPath = file.toPath();
   return new String( Files.readAllBytes( nioPath ) );
}

Static public void main( String ... args ) {
   If ( args[0].isEmpty() ) {
      System.out.println( "Enter a directory path" );
      System.exit( 0 );
   }
   Final Collection<File> files = getFilesInDir( new File( args[0] );
   For ( File file : files ) {
      Final String note = getTextInFile( file );
      ---  Insert here code a' la ExampleAggregatePipeline  ---
      ---  swap out the writer in ExampleAggregatePipeline with CasIOUtil 
method (below)  ---
   }
}

I must admit that I have never directly used it, but there is an xmi file 
writing method in org.apache.uima.fit.util.CasIOUtil named writeXmi( JCas jCas, 
File file ).  You could give this a try and see if it produces the type of 
output that you want.  The same utility class has a writeXCas(..) method.

If the above has absolutely nothing to do with your needs then please send me a 
bulleted list of items, example workflow, etc. and I'll see if I can be of 
service.

Oh, and I wrote the above code freehand, so MS Outlook is adding capital 
letters, etc.  If you cut and paste you'll need to change that - plus I haven't 
run/compiled, so there might be a typo or missed exception or something.  Or it 
may not work (in which case I'll throw in a little more effort).

Sean

-----Original Message-----
From: Tol O. [mailto:tol...@gmail.com] 
Sent: Monday, February 02, 2015 6:56 PM
To: dev@ctakes.apache.org
Subject: Re: Question about the pipeline

Maite Meseure Hugues <meseure.maite@...> writes:

> 
> Hello all,
> 
> Thank you for your preceding answers.
> I have a few questions regarding the pipeline example to run cTakes 
> programmatically.
> I am running ExampleAggregatePipeline.java with 
> ExampleHelloWorldAnnotator but I would like to know how I can change 
> it to run my data, as the CPE where we can choose the directory of our data.
> My second question is about the xml output generated with the CPE, can 
> I get the same xml output in using the example pipeline? and How?
> Thanks for your time.

I would like to ask the same question. After successfully setting up CTAKES 
following the Developers Guide I would also like to use a modified 
ExampleAggregatePipeline to output a CAS file identical to the output obtained 
by the CPE or the CVD when following the Users Guide.

This would be a great help for developers as a starting class to be able to 
programmatically obtain an annotated file based on a plaintext or XML input, 
same as through the two GUIs.

Right now I am reading through the Component Use Guide to replicate the CPE or 
the CVD tutorial with the test input, but it is a bit overwhelming.

Any pointers or suggestions would be really appreciated.

Tol O.

RE: Question about the pipeline

Reply via email to