[
https://issues.apache.org/jira/browse/TIKA-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann resolved TIKA-1645.
-------------------------------------
Resolution: Fixed
Fix Version/s: (was: 1.10)
1.9
Contributed! Thanks [~gostep] and [~selina]!
{noformat}
bash-3.2$ svn commit -m "Fix for TIKA-1645 & TIKA-1642: Extraction of
biomedical information using CTAKESParser contributed by Selina Chu, Giuseppe
Totaro and mattmann."
Sending CHANGES.txt
Sending tika-bundle/pom.xml
Sending tika-parsers/pom.xml
Adding tika-parsers/src/main/java/org/apache/tika/parser/ctakes
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESAnnotationProperty.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESConfig.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESContentHandler.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESParser.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESSerializer.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESUtils.java
Sending
tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
Transmitting file data ..........
Committed revision 1683968.
{noformat}
Please note, improvements are welcomed. I know Giuseppe is working on an
ExternalParser version of this and some other improvements. Selina is working
on unit tests.
> Extraction of biomedical information using CTAKESParser
> -------------------------------------------------------
>
> Key: TIKA-1645
> URL: https://issues.apache.org/jira/browse/TIKA-1645
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Giuseppe Totaro
> Assignee: Chris A. Mattmann
> Labels: patch
> Fix For: 1.9
>
> Attachments: CTAKESConfig.properties, TIKA-1645.patch,
> TIKA-1645.v02.patch, tika-config.xml
>
>
> As mentioned in [TIKA-1642|https://issues.apache.org/jira/browse/TIKA-1642],
> [CTAKESContentHandler|https://github.com/giuseppetotaro/CTAKESContentHadler]
> is a preliminary work in order to integrate [Apache
> cTAKES|http://ctakes.apache.org/] into Tika allowing users to extract
> biomedical information from clinical text.
> Essentially, this work includes a wrapper for CAS serializers that aim at
> dumping out the identified annotations into XML-based formats.
> You can find in attachment a new patch that includes the CTAKESParser, a new
> parser that decorates the AutoDetectParser and relies on a new version of
> CTAKESContentHandler, based on feedback from
> [TIKA-1642|https://issues.apache.org/jira/browse/TIKA-1642]. This parser
> generates the same output of AutoDetectParser and, in addition, the metadata
> containing the identified clinical annotations detected by cTAKES.
> To perform a cTAKES AnalysisEngine by using Tika CTAKESParser, you need first
> to install the last stable release of cTAKES (3.2.2), following the
> instructions on [User Install
> Guide|https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+User+Install+Guide].
> Then, you can launch Tika as follows:
> {noformat}
> CTAKES_HOME=/usr/local/apache-ctakes-3.2.2
> java -cp
> tika-app-1.10-SNAPSHOT.jar:${CTAKES_HOME}/desc:${CTAKES_HOME}/resources:${CTAKES_HOME}/lib/*:/path/to/CTAKESConfig
> org.apache.tika.cli.TikaCLI --config=/path/to/tika-config.xml /path/to/input
> {noformat}
> In the example above, {{/path/to/CTAKESConfig}} is the parent directory of
> file {{org/apache/tika/parser/ctakes/CTAKESConfig.properties}} that contains
> the configuration properties to build the cTAKES AnalysisEngine;
> {{tika-config.xml}} is a custom configuration file for Tika that contains the
> mimetypes whose CTAKESParser will perform parsing.
> You can find in attachment an example of both {{CTAKESConfig.properties}} and
> {{tika-config.xml}} to parse ISA-Tab files using cTAKES.
> You need [UMLS credentials|https://uts.nlm.nih.gov/home.html] in order to use
> the UMLS-based components of cTAKES.
> I would really appreciate your feedback.
> Thanks [~selina], [~chrismattmann] and [~lewismc] for supporting me on this
> work.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)