Hi Manuel,

The default clinical pipeline runs a piper file located in ctakes-core-res [1]. 
 If you are running using a ctakes binary build, which is how it looks, you can 
find the file in:
Resources/org/apache/ctakes/core/pipeline/DefaultFastPipeline.piper     

You can edit this file and add a different writer at the end / bottom.  There 
are a lot of file writers available, more than I have time to fully describe, 
but below is a partial list.  

pretty.html.HtmlTextWriter
pretty.plaintext.PrettyTextWriterFit
property.plaintext.PropertyTextWriterFit
CuiCountFileWriter
CuiListFileWriter
CuiLookupLister
HtmlTableCasConsumer
SentenceTokensPrinter
TextSpanWriter
TokenFreqCasConsumer
TokenOffsetsCasConsumer

As you have seen, xmi output contains everything under the sun.  The first 
three writers in the list create output with information that is most commonly 
desired (cuis, negation, uncertainty, etc.).  The rest are more focused in 
their output.  You can add the whole list to the end of the piper file 
mentioned above, prefixing each with the "add " command, or just add them 
individually.  Then make sure that you specify "-o <outputDirectory>" in your 
command line.  Some of the older writers may not accept -o as a valid parameter 
value specifier, in which case you may need to do something different.  Ending 
with "CasConsumer" is a good giveaway that the writer is one of the older types.

There is a JdbcWriterTemplate that was built to write to a database, but it 
requires a fair amount of configuration.

Sean

[1]  https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files



-----Original Message-----
From: Manuel Lamy [mailto:mmvp...@gmail.com] 
Sent: Sunday, March 04, 2018 10:29 PM
To: dev@ctakes.apache.org
Subject: Output formats - CPE - cTAKES - Persist in database [EXTERNAL]

Hello everyone,

I'm using cTAKES clinical pipeline in order to process a lot of documents in a 
row.

I'm using this command in the command line:  runClinicalPipeline.bat  -i input 
--xmiOut output  --user username  --pass password

This works, adapted to my credentials and my paths of course. My problem is 
that I can only output in XMI format.

My questions are the following:

-Is it possible to output a different kind of format than XMI? If yes, what 
should I change in this command and what are the available formats?

-It is of my interest to persist the structured clinical information extracted 
by cTAKES directly in a database. Is there a format that is more suitable to 
that task? At the moment, I can only output in XMI format. I built a parser in 
Perl with a lot of regex in order to process all the information in the XMI 
file and persist in a database. However, the XMI file has a complex structure 
and the script, despite of working well, is taking more time than it should to 
run and persist.

If someone could give me some advice about what my possibilities are, I would 
be appreciated.

Best regards,

Manuel

Reply via email to