Hello Gandhi,

What I'm actually looking for is to persist the diseases, medications,
anatomical regions, procedures and signs/symptoms found by cTAKES in a
database.

I have thousands of clinical records to process, like a lot of them, and
performance is already a concern to me. So I'm studying what my options are
in order to do this.

For the first experimental stage of my research, I just outputted the
results in XMI format (I didn't know any better) and created a script to
regex all the findings. Needless to say, even if you have a really good
script with a great amount of processing capacity, with thousands of
records it is just not feasible, since it will take much time to run.
Another mistake I made was to use a SQLite database. I will need to use
something clearly more powerful and scalable like MySQL from now on.

My problem now is deciding which path to take. I've tried all the outputs
listed by Sean (all the different writers) and none of them seems easier to
process than the XMI. I would just like to have something more basic, like
to create a txt file for each record processed, and that txt file would
just have a row with medications discovered, another row with the diseases
disocvered, another row with procedures, etc. Something straightforward.
Another solution would be to work with JdbcWriter, but I don't find any
good documentation to start working with it.

Maybe you can give me some suggestions about which path to take? Thanks a
lot!

Best regards,

Manuel Lamy

2018-03-05 3:51 GMT+00:00 Gandhi Rajan Natarajan <
gandhi.natara...@arisglobal.com>:

> Hi Manuel,
>
> As far as I know cTAKES supports Pretty print and HTML format too. For
> more info on this, you may have to look at the cTAKES demo webapp code
> under https://github.com/healthnlp/examples/blob/master/ctakes-
> web-client/src/main/java/org/apache/ctakes/web/client/
> servlet/DemoServlet.java
>
> Also if you are looking for help on parsing XML output, have a look at the
> beta version of cTAKES REST service XML parsing code under
> https://github.com/GoTeamEpsilon/ctakes-rest-service/blob/master/ctakes-
> web-rest/src/main/java/org/apache/ctakes/rest/util/XMLParser.java
>
> Regards,
> Gandhi
>
>
> -----Original Message-----
> From: Manuel Lamy [mailto:mmvp...@gmail.com]
> Sent: Monday, March 05, 2018 8:59 AM
> To: dev@ctakes.apache.org
> Subject: Output formats - CPE - cTAKES - Persist in database
>
> Hello everyone,
>
> I'm using cTAKES clinical pipeline in order to process a lot of documents
> in a row.
>
> I'm using this command in the command line:  runClinicalPipeline.bat  -i
> input --xmiOut output  --user username  --pass password
>
> This works, adapted to my credentials and my paths of course. My problem
> is that I can only output in XMI format.
>
> My questions are the following:
>
> -Is it possible to output a different kind of format than XMI? If yes,
> what should I change in this command and what are the available formats?
>
> -It is of my interest to persist the structured clinical information
> extracted by cTAKES directly in a database. Is there a format that is more
> suitable to that task? At the moment, I can only output in XMI format. I
> built a parser in Perl with a lot of regex in order to process all the
> information in the XMI file and persist in a database. However, the XMI
> file has a complex structure and the script, despite of working well, is
> taking more time than it should to run and persist.
>
> If someone could give me some advice about what my possibilities are, I
> would be appreciated.
>
> Best regards,
>
> Manuel
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you are not the named addressee you should not disseminate, distribute
> or copy this e-mail. Please notify the sender or system manager by email
> immediately if you have received this e-mail by mistake and delete this
> e-mail from your system. If you are not the intended recipient you are
> notified that disclosing, copying, distributing or taking any action in
> reliance on the contents of this information is strictly prohibited and
> against the law.
>

Reply via email to