Fw: How does cTAKES work? [EXTERNAL]

Finan, Sean Tue, 17 Dec 2019 07:39:01 -0800

Copy of internal email for devlist archival purposes.

________________________________
From: Finan, Sean
Sent: Tuesday, December 17, 2019 10:08 AM
To: Akram
Subject: Re: How does cTAKES work? [EXTERNAL]

Hi Akram,

To clarify:

There are a lot of ways to run ctakes.

There are a lot of ways to build a ctakes pipeline.

There are a lot of ways to provide ctakes input.

There are a lot of formats for ctakes output.

I never use the CVD.

I never use xml AE files.

I never type or paste clinical text into a gui.

I never use XMI output files.

   (almost never)

However, many people do things differently.

Your workflow just depends upon what information you want and how comfortable 
you are using different input and output formats, etc.

I run ctakes inside my IDE using the PiperFileRunner class.

I run ctakes piper files.

I use directories containing text files as input.

I produce html and custom output types.

   (normally)

For a new developer, I think that you can try this:

1.  In your IDE, navigate through ctakes-clinical-pipeline-res to the directory 
src/main/resources/org/apache/ctakes/clinical/pipeline/

2.  Make a copy of the DefaultFastPipeline.piper .  Call it whatever you want, 
but for now keep it in that directory.

3.  At the end of the piper file, append the line:

writeHtml

4.  If your IDE has a list of available maven profiles, enable the one named 
"runPiperGui"

   I use Intellij and this is extremely easy to do.  You should be able to find 
instructions for your IDE online.

5.  Run the maven compile step.

   If you see errors about glassfish not being found you can ignore them.

   A gui should launch.

   The gui will look mostly empty, but there are instructions in the bottom 
panel.

6.  Follow the instructions in the gui:

A)  Click the top-left button with a Blue Folder and Gear.

   This will open a file chooser to load a piper file.

B)  Navigate to and open your piper file.  See #1 above if you have forgotten 
the location.

   The gui will load your piper file and the upper-right panel will display the 
piper instructions.

C)  In the upper-left panel, fill in the values appropriately.

   a.  Click the little folder icon at the right of the top row named 
"InputDirectory".

      This will open a file chooser.

   b.  Navigate to and open the directory 
ctakes-examples-res/src/main/org/apache/ctakes/examples/notes/annotated/

   c.  Click the little folder icon at the right of the 2nd row named 
"OutputDirectory".

      This will open a file chooser.

   d.  Navigate to and open any directory you want for ctakes output.

   e.  SKIP the line named "LookupXml".

      Leaving this line empty will tell ctakes to use the default dictionary 
and dictionary configuration.

   f.  Enter your umls credentials in the lines named "umlsUser" and "umlsPass".

      You don't use a file chooser for this.  Just click on the empty box and 
type.

      ! make sure that you press the Enter key after you type the username and 
password !

D)  Save your parameters to a .cli file.  Click the button with a green arrow 
pointing down into a box.

      This will open a file chooser.  Save a file with any name you like in any 
directory you like.

E)  Run your pipeline.  Click the button with a green circle and running man.

   The window will turn grey and the bottom panel will show ctakes progress.

   The blue progress bar at the top should show the progress as ctakes 
processes each of the 15 example notes.

   When ctakes is complete the window will become enabled again (not grey)

7. Go to your output directory and look at the html files.

There are a lot of steps here, but that should cover everything.

This should show you that:

1.  There is a piper that already exists for the default ctakes pipeline.

2.  You can create your own piper files.

3.  ctakes can run batches of text files.

4.  There is a gui that you can use to run ctakes.

5.  ctakes can produce easy to read html files.

Once you have done all of this you can modify each of the above:

1.a.  Look at various piper files in ctakes that perform different tasks.

2.a.  Copy other ctakes piper files and modify them, or copy commands in those 
piper files to your own piper file.

3.a.  Process whatever documents you want.

4.a.  There are other ways to run ctakes.  From an IDE I use the 
PiperFileRunner class.

   For the PiperFileRunner class you use parameters like you did in the gui.

   Each row in the gui has a 2nd column with "-i" or "--umlsUser".

   The PiperFileRunner accepts those as command-line parameters.  For instance, 
"PiperFileRunner -i my/input/dir/ --umlsUser myUserName"

5.  You can find examples of other piper commands to add different writers.  
For instance:

// XMI output
writeXmis

// Write Fast Health Interoperability Resources (FHIR) json files.  fhir.org
package org.apache.ctakes.fhir.cc
add FhirJsonFileWriter SubDirectory=FHIR

// Write plaintext copy of note text with cui, semantic group, POS.  Relations 
are listed.
add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT

// Write plaintext copy of note sentences with entities and relations listed.
add property.plaintext.PropertyTextWriterFit SubDirectory=PROP

// Write Html files in a subdirectory to keep them separate from other output 
types.
add pretty.html.HtmlTextWriter SubDirectory=HTML

I really hope that this helps.

Sean

________________________________
From: Akram <[email protected]>
Sent: Tuesday, December 17, 2019 7:40 AM
To: [email protected]; Finan, Sean
Subject: Re: How does cTAKES work? [EXTERNAL]

Warning: Email originated outside Boston Children's. Don't click 
links/attachments unless you know sender & content seems safe.

________________________________
Many thanks for answering me,

cTAKES is the core of my research and I am stuck

How can I generate cTAKES without CVD? is there a command or GUI for that?

How to generate other format such as html or marked text?

------------------------------------------------------------------------------------

The way I know is :

There is misunderstanding for sure here.

We feed CVD with text such as "This patient has diabetes and no signs for 
kidney failure"

we also provide the Run > Load AE with the pipeline we are going to use.

Once we click on Run AE, cTakes work and analyse the provided text.

Then we save the results as .XMI which can be taken to any tool suck as UIMA to 
display results visually

am I right here?

Thanks

On Tuesday, 17 December 2019, 02:54:03 am AEDT, Finan, Sean 
<[email protected]> wrote:

Hi Akram,

Gandhi has provided some good links, and I agree that you should read that 
information.
In case you haven't found it, there is also a "quick start" manual is on this 
page:  
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=hEFTcNJENU6jFxURA11zNhC0oFclOI8NHLjYQ25vJkE&s=MMcr9fVavPsBU4gNl4c8RnmwjXA8rqHUrgEy_qw1hXM&e=>
Under "Documentation", there is a download named "A pamphlet/manual on cTAKES 
basics".
It was meant to have an accompanying human tutor, but it does contain some 
handy information.

> I can get CVD run on the binary version of cTAKES.
-- Excellent!
> but I have problem on the Developer version.
-- Are you using an IDE?  There might be a maven profile listed named "runCVD". 
 You can try to compile ctakes with that profile.
From a command line: "mvn compile -PrunCVD"
-- Regardless, if you've already built a binary then at least you can run it 
there.  The CVD is not a ctakes product, but is bundled with uima.  So if you 
change ctakes code CVD will still remain the same.
--  
https://uima.apache.org/d/uimaj-current/tools.html#ugr.tools.cvd<https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimaj-2Dcurrent_tools.html-23ugr.tools.cvd&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=hEFTcNJENU6jFxURA11zNhC0oFclOI8NHLjYQ25vJkE&s=YDgoL0cQiNOAe4aKwCioMuFyFqQ11Cw9eyEAurjTydE&e=>

I think that there is a misunderstanding here:
>When I try to Load AE on the CVD (Development Version) I get this error
-- The CVD is meant to display output from ctakes.
-- If you run ctakes to produce an .xmi file(s) then you can load the .xmi file 
into the CVD and view what ctakes discovered in the document.

While the CVD is very good for debugging and roaming details, you can also 
produce simpler output types such as html and marked text.
Other output types might be easier for new users, and they do not require 
running a second tool (CVD).

Sean

________________________________________
From: Akram <[email protected]<mailto:[email protected]>>
Sent: Sunday, December 15, 2019 6:35 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: How does cTAKES work? [EXTERNAL]

Warning: Email originated outside Boston Children's. Don't click 
links/attachments unless you know sender & content seems safe.

**********************************************************************
Thanks Gandhi
I can get CVD run on the binary version of cTAKES.
but I have problem on the Developer version.
When I try to Load AE on the CVD (Development Version) I get this error
When I try to load : AggregatePlaintextProcessor.xml
I get Error : org.apache.uima.resource.ResourceInitializationException: More 
detailed information in the log file

When I try to load : AggregatePlaintextFastUMLSProcessor.xml
I get Error : org.apache.uima.resource.ResourceInitializationException: an 
import could not be resolved. No file with name 
"org/apache/ctakes/drugner/types/TypeSystem.xml"  was found in the class path 
or data path 
(Descriptor:file:/D:/cTAKES/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)
 More detailed information in the log file

P.S. I changed CTAKES_HOME to D:\cTAKES which is the development folder that 
has all code)
How can I fix that?
=================================================Error 1:
<date>2019-12-15T22:23:11</date>  <millis>1576408991483</millis>  
<sequence>7</sequence>  <logger>org.apache.uima</logger>  <level>SEVERE</level> 
 <class>org.apache.uima.tools.cvd.MainFrame</class>  
<method>handleException(527)</method>  <thread>24</thread>  <message>Exception 
occurred</message>  <exception>   
<message>org.apache.uima.resource.ResourceInitializationException</message>   
<frame>     
<class>org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl</class>    
 <method>load</method>     <line>82</line>   </frame> 
=================================================Error 2:

<date>2019-12-15T22:32:54</date>  <millis>1576409574952</millis>  
<sequence>9</sequence>  <logger>org.apache.uima</logger>  <level>SEVERE</level> 
 <class>org.apache.uima.tools.cvd.MainFrame</class>  
<method>handleException(527)</method>  <thread>24</thread>  <message>An import 
could not be resolved.  No file with name 
"org/apache/ctakes/drugner/types/TypeSystem.xml" was found in the class path or 
data path. (Descriptor: 
file:/J:/__cTAKES/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)</message>
  <exception>   
<message>org.apache.uima.resource.ResourceInitializationException: An import 
could not be resolved.  No file with name 
"org/apache/ctakes/drugner/types/TypeSystem.xml" was found in the class path or 
data path. (Descriptor: 
file:/J:/__cTAKES/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)</message>
   <frame>     
<class>org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl</class>
     <method>initialize</method>     <line>165</line>   </frame>   <frame>     
<class>org.apache.uima.impl.AnalysisEngineFactory_impl</class>     
<method>produceResource</method>     <line>94</line>   </frame>

    On Sunday, 15 December 2019, 05:46:24 pm AEDT, gandhi rajan 
<[email protected]<mailto:[email protected]>> wrote:

Hi Akram,

I would prefer to have a look at the following link:
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BUser-2BInstall-2BGuide&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=xWsLlr2p_eCQrJQdreWNUdMv0gchC_6EB3QxoTfWeKw&s=qxZURGVbSA4ymygrHulseFwwUMkwDW5dyrRJTsw_1k4&e=

And for loading custom dictionaries you gotta look at the following link:
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BDictionaries-2Band-2BModels&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=xWsLlr2p_eCQrJQdreWNUdMv0gchC_6EB3QxoTfWeKw&s=eQMp1QR1LlSy17WiuET0TT87CkBF-TMbH5on4i7CJiU&e=

On Sun, Dec 15, 2019 at 9:19 AM Akram 
<[email protected]<mailto:[email protected]>> wrote:

> Hi
> I have 2 questions, and would appreciate the help.
> The first question
> ==================
> I have been trying to get how cTAKES work and not so much luck
> I know that we build .piper file through "cTAKES Simple Pipeline
> Fabricator"
> we get a .piper file
> Then the process is not clear
> my understanding and I am not sure if I am right here
> is that we create a .xml file from the .piper file through
> I tried using "cTAKES Pipe File Submitter"
> I loaded HelloWorld.piper but I got this error
> org.apache.uima.resource.ResourceInitializationException: MESSAGE
> LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBundle, key No Analysis Component found for
> ContextDependentTokenizerAnnotator
> I read Sean's email in the mail archive
> and Replaced
> add ContextDependentTokenizerAnnotator
> with
> add org.apache.ctakes.contexttokenizer.ae
> .ContextDependentTokenizerAnnotator
> but still getting the same error
>
> The 2nd question
> ===============
> After creating the .xml the next step will be
> Loading the result file in the "CAS Visual Debugger (CVD)"
> and build .xmi file
> but what is next?
> How can I view the the NER values? and where is the coloured screen that
> highlight each token in a colour and each NER in a different colour?
> and where I train cTAKES to take ICD10 or SNOMED or any other dataset?
>
>

--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"

Fw: How does cTAKES work? [EXTERNAL]

Reply via email to