[jira] [Commented] (TIKA-1645) Extraction of biomedical information using CTAKESParser

Chris A. Mattmann (JIRA) Thu, 04 Jun 2015 16:24:06 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573782#comment-14573782
 ]


Chris A. Mattmann commented on TIKA-1645:
-----------------------------------------

Hey Giuseppe:

I tried out your latest patch. It didn't seem to work maybe there is something 
I'm doing wrong here.

{noformat}
[chipotle:~/tmp/tika1.9] mattmann% java -classpath 
tika-app/target/tika-app-1.10-SNAPSHOT.jar:${CTAKES_HOME}/desc:${CTAKES_HOME}/resources:${CTAKES_HOME}/lib/\*:.
 org.apache.tika.cli.TikaCLI --config=tika-config.xml -m 
gist5a56f8815bbb7374fddd-069bf364fb0a178a9321cc67b6e14b38d80c2446/i_Investigation.txt
 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/Users/mattmann/tmp/tika1.9/tika-app/target/tika-app-1.10-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/local/apache-ctakes-3.2.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Level value for root is  [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss} %5p 
%c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
Content-Length: 8128
Content-Type: application/x-isatab-investigation
X-Parsed-By: org.apache.tika.parser.EmptyParser
resourceName: i_Investigation.txt
{noformat}

I'm using your tika-config.xml attached to this issue. I changed the 
CTAKESConfig.properties to look like:

{noformat}
aeDescriptorPath=/usr/local/apache-ctakes-3.2.1/desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
text=false
annotationProps=BEGIN,END,ONTOLOGY_CONCEPT_ARR
separatorChar=:
metadata=Study Title,Study Description
UMLSUser=<OMITTED>
UMLPass=<OMITTED>
{noformat}

I downloaded:

https://gist.github.com/5a56f8815bbb7374fddd/download

Per your instructions offline, and then tried the above command to parse. Any 
idea?

My $CTAKES_HOME is:

{noformat}
[chipotle:~/tmp/tika1.9] mattmann% echo $CTAKES_HOME
/usr/local/apache-ctakes-3.2.1
[chipotle:~/tmp/tika1.9] mattmann% ls $CTAKES_HOME
LICENSE             NOTICE              README              RELEASE_NOTES.html  
bin/                config/             desc/               lib/                
resources/
[chipotle:~/tmp/tika1.9] mattmann% 
{noformat}

I am using Apache cTAKES 3.2.1.

I also noticed the * jar classpath include didn't work for me it gives me this 
error:

{noformat}
[chipotle:~/tmp/tika1.9] mattmann% java -cp 
tika-app/target/tika-app-1.10-SNAPSHOT.jar:${CTAKES_HOME}/desc:${CTAKES_HOME}/resources:${CTAKES_HOME}/lib/*
 org.apache.tika.cli.TikaCLI
java: No match.
{noformat}



> Extraction of biomedical information using CTAKESParser
> -------------------------------------------------------
>
>                 Key: TIKA-1645
>                 URL: https://issues.apache.org/jira/browse/TIKA-1645
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Giuseppe Totaro
>            Assignee: Giuseppe Totaro
>              Labels: patch
>             Fix For: 1.10
>
>         Attachments: CTAKESConfig.properties, TIKA-1645.patch, 
> TIKA-1645.v02.patch, tika-config.xml
>
>
> As mentioned in [TIKA-1642|https://issues.apache.org/jira/browse/TIKA-1642], 
> [CTAKESContentHandler|https://github.com/giuseppetotaro/CTAKESContentHadler] 
> is a preliminary work in order to integrate [Apache 
> cTAKES|http://ctakes.apache.org/] into Tika allowing users to extract 
> biomedical information from clinical text.
> Essentially, this work includes a wrapper for CAS serializers that aim at 
> dumping out the identified annotations into XML-based formats.
> You can find in attachment a new patch that includes the CTAKESParser, a new 
> parser that decorates the AutoDetectParser and relies on a new version of 
> CTAKESContentHandler, based on feedback from 
> [TIKA-1642|https://issues.apache.org/jira/browse/TIKA-1642]. This parser 
> generates the same output of AutoDetectParser and, in addition, the metadata 
> containing the identified clinical annotations detected by cTAKES.
> To perform a cTAKES AnalysisEngine by using Tika CTAKESParser, you need first 
> to install the last stable release of cTAKES (3.2.2), following the 
> instructions on [User Install 
> Guide|https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+User+Install+Guide].
>  Then, you can launch Tika as follows:
> {noformat}
> CTAKES_HOME=/usr/local/apache-ctakes-3.2.2
> java -cp 
> tika-app-1.10-SNAPSHOT.jar:${CTAKES_HOME}/desc:${CTAKES_HOME}/resources:${CTAKES_HOME}/lib/*:/path/to/CTAKESConfig
>  org.apache.tika.cli.TikaCLI --config=/path/to/tika-config.xml /path/to/input
> {noformat}
> In the example above, {{/path/to/CTAKESConfig}} is the parent directory of 
> file {{org/apache/tika/parser/ctakes/CTAKESConfig.properties}} that contains 
> the configuration properties to build the cTAKES AnalysisEngine; 
> {{tika-config.xml}} is a custom configuration file for Tika that contains the 
> mimetypes whose CTAKESParser will perform parsing.
> You can find in attachment an example of both {{CTAKESConfig.properties}} and 
> {{tika-config.xml}} to parse ISA-Tab files using cTAKES.
> You need [UMLS credentials|https://uts.nlm.nih.gov/home.html] in order to use 
> the UMLS-based components of cTAKES.
> I would really appreciate your feedback.
> Thanks [~selina], [~chrismattmann] and [~lewismc] for supporting me on this 
> work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1645) Extraction of biomedical information using CTAKESParser

Reply via email to