Hello Sean, Thanks for the quick response as always. I've tried several of those writers and any of them gives me what I pretend in order to conduct my research successfully.
What I'm aiming for is an output that is easily processed (the opposite of the XMI obtained), in order to persist in a database after at ease. What I want to persist in a database is only the diseases, medications, anatomical regions, clinical procedures and signs/symptoms, associated with each clinical record passed to cTAKES. So clearly I just want the most standard findings made by cTAKES, nothing from the other world. Now I have three option that I can think of in order to accomplish the objective: 1. Try to mesh and work with the JdbcWriterTemplate. This would fit my needs, by the name of it. But for what I've already seen, people usually have problems putting this to work properly, since the configuration is not straighforward. So I guess this would be a rough path to take, what you think? Read my other two options and maybe you'll understand my doubts. 2. The second option would be to have an output that is so straighforward, that I could build a script and regex the sake of it, in order to obtain the clinical entities that I want (enunciated above). I'm thinking about a txt file that would just have something like: "Diseases -> diseases a, disease b \n Medications -> medication a, medication b, etc" This way I could just run a script and grab all the clinical entities. The processing performance would be much better than the XMI since it would have just some lines with what I want. From the formats that I tried and worked, none of them seems easily processable. 3. This one would be rough probably, but maybe "write my own writer", that would perform like described in point 2. So Sean, I'm again at doubt about which path to take. I have thousands of records coming at me soon and I'll have to make decisions. I hope that, as always, you can help me taking the most efficient path to do the job. If I'm overestimating the difficulty of putting JdbcWriterTemplate to work, please tell me. I already have the Dev version of cTAKES for several months now so I'm already kinda conversant with the system already. Thanks again! Best regards, Manuel 2018-03-05 15:35 GMT+00:00 Finan, Sean <sean.fi...@childrens.harvard.edu>: > Hi Manuel, > > The default clinical pipeline runs a piper file located in ctakes-core-res > . If you are running using a ctakes binary build, which is how it > looks, you can find the file in: > Resources/org/apache/ctakes/core/pipeline/DefaultFastPipeline.piper > > You can edit this file and add a different writer at the end / bottom. > There are a lot of file writers available, more than I have time to fully > describe, but below is a partial list. > > pretty.html.HtmlTextWriter > pretty.plaintext.PrettyTextWriterFit > property.plaintext.PropertyTextWriterFit > CuiCountFileWriter > CuiListFileWriter > CuiLookupLister > HtmlTableCasConsumer > SentenceTokensPrinter > TextSpanWriter > TokenFreqCasConsumer > TokenOffsetsCasConsumer > > As you have seen, xmi output contains everything under the sun. The first > three writers in the list create output with information that is most > commonly desired (cuis, negation, uncertainty, etc.). The rest are more > focused in their output. You can add the whole list to the end of the > piper file mentioned above, prefixing each with the "add " command, or just > add them individually. Then make sure that you specify "-o > <outputDirectory>" in your command line. Some of the older writers may not > accept -o as a valid parameter value specifier, in which case you may need > to do something different. Ending with "CasConsumer" is a good giveaway > that the writer is one of the older types. > > There is a JdbcWriterTemplate that was built to write to a database, but > it requires a fair amount of configuration. > > Sean > >  https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files > > > > -----Original Message----- > From: Manuel Lamy [mailto:mmvp...@gmail.com] > Sent: Sunday, March 04, 2018 10:29 PM > To: email@example.com > Subject: Output formats - CPE - cTAKES - Persist in database [EXTERNAL] > > Hello everyone, > > I'm using cTAKES clinical pipeline in order to process a lot of documents > in a row. > > I'm using this command in the command line: runClinicalPipeline.bat -i > input --xmiOut output --user username --pass password > > This works, adapted to my credentials and my paths of course. My problem > is that I can only output in XMI format. > > My questions are the following: > > -Is it possible to output a different kind of format than XMI? If yes, > what should I change in this command and what are the available formats? > > -It is of my interest to persist the structured clinical information > extracted by cTAKES directly in a database. Is there a format that is more > suitable to that task? At the moment, I can only output in XMI format. I > built a parser in Perl with a lot of regex in order to process all the > information in the XMI file and persist in a database. However, the XMI > file has a complex structure and the script, despite of working well, is > taking more time than it should to run and persist. > > If someone could give me some advice about what my possibilities are, I > would be appreciated. > > Best regards, > > Manuel >