[
https://issues.apache.org/jira/browse/TIKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann resolved TIKA-1642.
-------------------------------------
Resolution: Fixed
Fix Version/s: 1.9
Assignee: Chris A. Mattmann (was: Giuseppe Totaro)
- fixed!
{noformat}
bash-3.2$ svn commit -m "Fix for TIKA-1645 & TIKA-1642: Extraction of
biomedical information using CTAKESParser contributed by Selina Chu, Giuseppe
Totaro and mattmann."
Sending CHANGES.txt
Sending tika-bundle/pom.xml
Sending tika-parsers/pom.xml
Adding tika-parsers/src/main/java/org/apache/tika/parser/ctakes
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESAnnotationProperty.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESConfig.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESContentHandler.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESParser.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESSerializer.java
Adding
tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESUtils.java
Sending
tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
Transmitting file data ..........
Committed revision 1683968.
{noformat}
> Integrate cTAKES into Tika
> --------------------------
>
> Key: TIKA-1642
> URL: https://issues.apache.org/jira/browse/TIKA-1642
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Selina Chu
> Assignee: Chris A. Mattmann
> Fix For: 1.9
>
>
> [~gostep] has written a preliminary version of
> [CTAKESContentHandler|https://github.com/giuseppetotaro/CTAKESContentHadler]
> to integrate [Apache cTAKES|http://ctakes.apache.org/] into Tika.
> The CTAKESContentHandler allows to perform the following step into Tika:
> * create an AnalysisEngine based on a given XML descriptor;
> * create a CAS (Common Analysis System) appropriate for this AnalysisEngine;
> * populate the CAS with the text extracted by using Tika;
> * perform the AnalysisEngine against the plain text added to CAS;
> * write out the results in the given format (XML, XCAS, XMI, etc.).
> It would be great improvement if we can parse the output of cTAKES and create
> a list of metadata which describes the terms found in the annotation index
> and their corresponding tokens. For instance, using the
> AggregatePlaintextFastUMLSProcessor analysis engine, we can utilize the UMLS
> database to obtain the annotations related to DiseaseDisorderMention, and I
> would like to be able to produce a list of words corresponding to the input
> text which is annotated as DiseaseDisorderMention.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)