[jira] [Commented] (TIKA-1642) Integrate cTAKES into Tika

Hudson (JIRA) Mon, 08 Jun 2015 05:55:29 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577127#comment-14577127
 ]


Hudson commented on TIKA-1642:
------------------------------

FAILURE: Integrated in tika-trunk-jdk1.7 #739 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/739/])
Fix indents to match http://tika.apache.org/contribute.html#Code_Formatting 
TIKA-1642 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1684170)
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESAnnotationProperty.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESConfig.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESContentHandler.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESParser.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESSerializer.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESUtils.java


> Integrate cTAKES into Tika
> --------------------------
>
>                 Key: TIKA-1642
>                 URL: https://issues.apache.org/jira/browse/TIKA-1642
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Selina Chu
>            Assignee: Chris A. Mattmann
>             Fix For: 1.9
>
>
> [~gostep] has written a preliminary version of 
> [CTAKESContentHandler|https://github.com/giuseppetotaro/CTAKESContentHadler] 
> to integrate [Apache cTAKES|http://ctakes.apache.org/] into Tika.
> The CTAKESContentHandler allows to perform the following step into Tika:
> * create an AnalysisEngine based on a given XML descriptor;
> * create a CAS (Common Analysis System) appropriate for this AnalysisEngine;
> * populate the CAS with the text extracted by using Tika;
> * perform the AnalysisEngine against the plain text added to CAS;
> * write out the results in the given format (XML, XCAS, XMI, etc.).
> It would be great improvement if we can parse the output of cTAKES and create 
> a list of metadata which describes the terms found in the annotation index 
> and their corresponding tokens. For instance, using the 
> AggregatePlaintextFastUMLSProcessor analysis engine, we can utilize the UMLS 
> database to obtain the annotations related to DiseaseDisorderMention, and I 
> would like to be able to produce a list of words corresponding to the input 
> text which is annotated as DiseaseDisorderMention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1642) Integrate cTAKES into Tika

Reply via email to