FW: UWM graduate student, need help on using CTakes

Chen, Pei Wed, 10 Apr 2013 12:22:25 -0700

Hi Soheil,
[including dev@ctakes]
I think this (1) seems to be a pretty common use case.
One can configure the input directory, pipeline, output directory, run UIMA's 
Collection Processing Engine via command line:
ctakes-clinical-pipeline/desc/collection_processing_engine/test_plaintext.xml 
as an example.

But I wonder if we could simplify/enhance a set of CLI tools...

--Pei

From: Soheil Moosavi [mailto:[email protected]]
Sent: Wednesday, April 10, 2013 3:07 PM
To: Chen, Pei
Cc: Savova, Guergana; Rashmi Prasad
Subject: Re: UWM graduate student, need help on using CTakes

Dear Pei,

Thank you very much for your step by step and descriptive comments. I have read 
both user and developers guide of CTAKES from Apache website. As far as I 
understand, the users guide let the user run the CTAKES GUI and use the 
interface to work with components. Developers guide on the other hand teaches 
developers how to use the source code and add their own annotator. It needs to 
get the source code and work with it.

I'll explain more how we usually use the library files in our projects. There 
are two usages that I have in mind:
1. Here is UWM we install many tools on the server and let the students connect 
to the server and use the tools. For example, students can follow these steps 
to use tokenizer, spliter, POS tagger, lemmatizer and NER of stanford corenlp 
library:

     - =============================================
     - Connect to the unix server with your username and password
     - cd /data02/tools/StanfordCoreNLP/stanford-corenlp-2012-07-09
     - java -cp 
stanford-corenlp-2012-07-09.jar:stanford-corenlp-2012-07-06-models.jar:xom.jar:joda-time.jar
 -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators 
tokenize,ssplit,pos,lemma,ner -file input.txt -outputExtension .stanfordnlp
     - annotators specifies which annotaotrs do you need to be generated. You 
can change it if you don't need all of them.
     - file specifies the input text file which contains your sentences.
     - The output for your input file will be written in a file with the same 
name as input file, plus .xml at the end of it's name. For exampl if the input 
file name is "input.txt", the outpt will be written in "input.txt.xml". The 
output format is XML based.
     - =============================================

So users can easily call the jar file on the server, give the input and get the 
output to use in as part of their program.

I am wondering if such a jar file or library is available with CTAKES which let 
the users call it using command line?

2. The other way that I use the Stanford CoreNLP is to add the 
"stanford-corenlp-1.3.4.jar" file to my java project and import the classes to 
my project. Then I can call the classes and functions and get the output to use 
it in my program.
So, I am also wondering about possibility of doing the same thing with CTAKES 
and use it as a library of java classes in my code.

These two kinds of using jar files or library files are very common. I would 
appreciate it if let us know about possibility of using these methods in the 
projects.

I really appreciate your comments and advises.

Sincerely,

Soheil Moosavi
-----------------------------------------------------------------------------------

FW: UWM graduate student, need help on using CTakes

Reply via email to