Hi Manuel, The default clinical pipeline runs a piper file located in ctakes-core-res [1]. If you are running using a ctakes binary build, which is how it looks, you can find the file in: Resources/org/apache/ctakes/core/pipeline/DefaultFastPipeline.piper
You can edit this file and add a different writer at the end / bottom. There are a lot of file writers available, more than I have time to fully describe, but below is a partial list. pretty.html.HtmlTextWriter pretty.plaintext.PrettyTextWriterFit property.plaintext.PropertyTextWriterFit CuiCountFileWriter CuiListFileWriter CuiLookupLister HtmlTableCasConsumer SentenceTokensPrinter TextSpanWriter TokenFreqCasConsumer TokenOffsetsCasConsumer As you have seen, xmi output contains everything under the sun. The first three writers in the list create output with information that is most commonly desired (cuis, negation, uncertainty, etc.). The rest are more focused in their output. You can add the whole list to the end of the piper file mentioned above, prefixing each with the "add " command, or just add them individually. Then make sure that you specify "-o <outputDirectory>" in your command line. Some of the older writers may not accept -o as a valid parameter value specifier, in which case you may need to do something different. Ending with "CasConsumer" is a good giveaway that the writer is one of the older types. There is a JdbcWriterTemplate that was built to write to a database, but it requires a fair amount of configuration. Sean [1] https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files -----Original Message----- From: Manuel Lamy [mailto:mmvp...@gmail.com] Sent: Sunday, March 04, 2018 10:29 PM To: dev@ctakes.apache.org Subject: Output formats - CPE - cTAKES - Persist in database [EXTERNAL] Hello everyone, I'm using cTAKES clinical pipeline in order to process a lot of documents in a row. I'm using this command in the command line: runClinicalPipeline.bat -i input --xmiOut output --user username --pass password This works, adapted to my credentials and my paths of course. My problem is that I can only output in XMI format. My questions are the following: -Is it possible to output a different kind of format than XMI? If yes, what should I change in this command and what are the available formats? -It is of my interest to persist the structured clinical information extracted by cTAKES directly in a database. Is there a format that is more suitable to that task? At the moment, I can only output in XMI format. I built a parser in Perl with a lot of regex in order to process all the information in the XMI file and persist in a database. However, the XMI file has a complex structure and the script, despite of working well, is taking more time than it should to run and persist. If someone could give me some advice about what my possibilities are, I would be appreciated. Best regards, Manuel