Re: Question about the pipeline

Tol O . Tue, 03 Feb 2015 16:35:00 -0800

>

Sean,


Thank you for the detailed reply.

As you mentioned, I had to revert the capital letters from your Outlook, and
also, if somebody else wants to use the code and cannot get it to run: the
getFilesInDir method needs to return the populated Collection<File>
fileList, the variable final File[] fileList and its usage should be renamed
to something else (as the variable name already exists) and the main method
needs to throw an IOException.

I think these were all the changes I made so that the txt files from a
folder are added to the collection, many thanks again.

What I am looking to do is also what the description in
"ExampleAggregatePipeline" says, "running a pipeline programatically w/o
uima xml descriptor xml files". This is accomplished by what I understand
the uimaFIT classes, so that AEs can be defined in Java, added to a Pipeline
and directly run.

The uimaFIT page gives a nice Java snippet that uses uimaFIT in a similar
way as the cTAKES example, I pasted the few Java lines below at [1]. 
http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.introduction

I would like to use cTAKES in my own Java programs such that, just like the
ExampleAggregatePipeline, uimaFIT can be used create and run a cTAKES
pipeline to annotate medical texts. Then, I could also output the result in
CAS files, just like the CVD GUI is doing. This would allow to directly be
able to add or modify my own AnalysisEngines.

Essentially, I want to know how to set up the cTAKES objects correctly into
a pipeline in a Java programs, so that medical texts are annotated, like the
GUI is doing. I would really appreciate any hints or how to accomplish this. 

Following your code example to read the files the outlined idea is:

for ( File file : files ) {
      Final String note = getTextInFile( file );
      JCas jCas = JCasFactory.createJCas();
      jCas.setDocumentText(note);

      // 1. create the AnalysisEngines for tokenizer, tagger and other
cTAKES components etc. to annotate medical texts
      // 2. runPipeline(jCas, ...);
}

[1]
The code snippet from uimaFIT:

JCas jCas = JCasFactory.createJCas();

jCas.setDocumentText("some text");

AnalysisEngine tokenizer = createEngine(MyTokenizer.class);

AnalysisEngine tagger = createEngine(MyTagger.class);

runPipeline(jCas, tokenizer, tagger);

for(Token token : iterate(jCas, Token.class)){
    System.out.println(token.getTag());
}

Tol O.


Finan, Sean <Sean.Finan@...> writes:

> 
> Hi Tol (and Maite),
> 
> I'm not entirely certain that I understand the question, but here is an
attempt to help.  If I'm
> oversimplifying then I apologize.
> 
> I think that ExampleAggregatePipeline is intended to represent a very
simple single-note pipeline and
> that custom code could be produced by using it as an example.
> 
> If you want to process texts in a directory, you can find with a web
search plenty of ways to list files in a
> directory and read text from files. 
org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
> might be what you used in the CPE, and you can certainly peruse the code
and take what you need.  Or, if you
> decide to write a simple diy,  here is one possibility:
> 
> Static public Collection<File> getFilesInDir( final File directory ) {
>    final Collection<File> fileList = new ArrayList<>();
>    final File[] fileList = directory.listFiles();
>    if ( fileList == null ) {
>       System.err.println( "please check the directory " +
directory.getAbsolutePath() );
>       System.exit( 1 );
>    }
>     for ( final File file : directory.listFiles() ) {
>         if ( file.canRead() ) {
>             fileList.add( file );
>         }
>     }
> } 
> 
> Static public String getTextInFile( final File file ) throws IOException {
  -- or handle ioE herein
>    final Path nioPath = file.toPath();
>    return new String( Files.readAllBytes( nioPath ) );
> }
> 
> Static public void main( String ... args ) {
>    If ( args[0].isEmpty() ) {
>       System.out.println( "Enter a directory path" );
>       System.exit( 0 );
>    }
>    Final Collection<File> files = getFilesInDir( new File( args[0] );
>    For ( File file : files ) {
>       Final String note = getTextInFile( file );
>       ---  Insert here code a' la ExampleAggregatePipeline  ---
>       ---  swap out the writer in ExampleAggregatePipeline with CasIOUtil
method (below)  ---
>    }
> }
> 
> I must admit that I have never directly used it, but there is an xmi file
writing method in
> org.apache.uima.fit.util.CasIOUtil named writeXmi( JCas jCas, File file ).
 You could give this a try
> and see if it produces the type of output that you want.  The same utility
class has a writeXCas(..) method.
> 
> If the above has absolutely nothing to do with your needs then please send
me a bulleted list of items,
> example workflow, etc. and I'll see if I can be of service.
> 
> Oh, and I wrote the above code freehand, so MS Outlook is adding capital
letters, etc.  If you cut and paste
> you'll need to change that - plus I haven't run/compiled, so there might
be a typo or missed exception or
> something.  Or it may not work (in which case I'll throw in a little more
effort).
> 
> Sean
> 
> -----Original Message-----
> From: Tol O. [mailto:toltox@...] 
> Sent: Monday, February 02, 2015 6:56 PM
> To: dev@...
> Subject: Re: Question about the pipeline
> 
> Maite Meseure Hugues <meseure.maite <at> ...> writes:
> 
> > 
> > Hello all,
> > 
> > Thank you for your preceding answers.
> > I have a few questions regarding the pipeline example to run cTakes 
> > programmatically.
> > I am running ExampleAggregatePipeline.java with 
> > ExampleHelloWorldAnnotator but I would like to know how I can change 
> > it to run my data, as the CPE where we can choose the directory of our data.
> > My second question is about the xml output generated with the CPE, can 
> > I get the same xml output in using the example pipeline? and How?
> > Thanks for your time.
> 
> I would like to ask the same question. After successfully setting up
CTAKES following the Developers Guide
> I would also like to use a modified ExampleAggregatePipeline to output a
CAS file identical to the output
> obtained by the CPE or the CVD when following the Users Guide.
> 
> This would be a great help for developers as a starting class to be able
to programmatically obtain an
> annotated file based on a plaintext or XML input, same as through the two
GUIs.
> 
> Right now I am reading through the Component Use Guide to replicate the
CPE or the CVD tutorial with the test
> input, but it is a bit overwhelming.
> 
> Any pointers or suggestions would be really appreciated.
> 
> Tol O.
> 
>

Re: Question about the pipeline

Reply via email to