Copy of internal email for devlist archival purposes. ________________________________ From: Finan, Sean Sent: Tuesday, December 17, 2019 10:08 AM To: Akram Subject: Re: How does cTAKES work? [EXTERNAL]
Hi Akram, To clarify: There are a lot of ways to run ctakes. There are a lot of ways to build a ctakes pipeline. There are a lot of ways to provide ctakes input. There are a lot of formats for ctakes output. I never use the CVD. I never use xml AE files. I never type or paste clinical text into a gui. I never use XMI output files. (almost never) However, many people do things differently. Your workflow just depends upon what information you want and how comfortable you are using different input and output formats, etc. I run ctakes inside my IDE using the PiperFileRunner class. I run ctakes piper files. I use directories containing text files as input. I produce html and custom output types. (normally) For a new developer, I think that you can try this: 1. In your IDE, navigate through ctakes-clinical-pipeline-res to the directory src/main/resources/org/apache/ctakes/clinical/pipeline/ 2. Make a copy of the DefaultFastPipeline.piper . Call it whatever you want, but for now keep it in that directory. 3. At the end of the piper file, append the line: writeHtml 4. If your IDE has a list of available maven profiles, enable the one named "runPiperGui" I use Intellij and this is extremely easy to do. You should be able to find instructions for your IDE online. 5. Run the maven compile step. If you see errors about glassfish not being found you can ignore them. A gui should launch. The gui will look mostly empty, but there are instructions in the bottom panel. 6. Follow the instructions in the gui: A) Click the top-left button with a Blue Folder and Gear. This will open a file chooser to load a piper file. B) Navigate to and open your piper file. See #1 above if you have forgotten the location. The gui will load your piper file and the upper-right panel will display the piper instructions. C) In the upper-left panel, fill in the values appropriately. a. Click the little folder icon at the right of the top row named "InputDirectory". This will open a file chooser. b. Navigate to and open the directory ctakes-examples-res/src/main/org/apache/ctakes/examples/notes/annotated/ c. Click the little folder icon at the right of the 2nd row named "OutputDirectory". This will open a file chooser. d. Navigate to and open any directory you want for ctakes output. e. SKIP the line named "LookupXml". Leaving this line empty will tell ctakes to use the default dictionary and dictionary configuration. f. Enter your umls credentials in the lines named "umlsUser" and "umlsPass". You don't use a file chooser for this. Just click on the empty box and type. ! make sure that you press the Enter key after you type the username and password ! D) Save your parameters to a .cli file. Click the button with a green arrow pointing down into a box. This will open a file chooser. Save a file with any name you like in any directory you like. E) Run your pipeline. Click the button with a green circle and running man. The window will turn grey and the bottom panel will show ctakes progress. The blue progress bar at the top should show the progress as ctakes processes each of the 15 example notes. When ctakes is complete the window will become enabled again (not grey) 7. Go to your output directory and look at the html files. There are a lot of steps here, but that should cover everything. This should show you that: 1. There is a piper that already exists for the default ctakes pipeline. 2. You can create your own piper files. 3. ctakes can run batches of text files. 4. There is a gui that you can use to run ctakes. 5. ctakes can produce easy to read html files. Once you have done all of this you can modify each of the above: 1.a. Look at various piper files in ctakes that perform different tasks. 2.a. Copy other ctakes piper files and modify them, or copy commands in those piper files to your own piper file. 3.a. Process whatever documents you want. 4.a. There are other ways to run ctakes. From an IDE I use the PiperFileRunner class. For the PiperFileRunner class you use parameters like you did in the gui. Each row in the gui has a 2nd column with "-i" or "--umlsUser". The PiperFileRunner accepts those as command-line parameters. For instance, "PiperFileRunner -i my/input/dir/ --umlsUser myUserName" 5. You can find examples of other piper commands to add different writers. For instance: // XMI output writeXmis // Write Fast Health Interoperability Resources (FHIR) json files. fhir.org package org.apache.ctakes.fhir.cc add FhirJsonFileWriter SubDirectory=FHIR // Write plaintext copy of note text with cui, semantic group, POS. Relations are listed. add pretty.plaintext.PrettyTextWriterFit SubDirectory=TEXT // Write plaintext copy of note sentences with entities and relations listed. add property.plaintext.PropertyTextWriterFit SubDirectory=PROP // Write Html files in a subdirectory to keep them separate from other output types. add pretty.html.HtmlTextWriter SubDirectory=HTML I really hope that this helps. Sean ________________________________ From: Akram <as...@yahoo.com> Sent: Tuesday, December 17, 2019 7:40 AM To: dev@ctakes.apache.org; Finan, Sean Subject: Re: How does cTAKES work? [EXTERNAL] Warning: Email originated outside Boston Children's. Don't click links/attachments unless you know sender & content seems safe. ________________________________ Many thanks for answering me, cTAKES is the core of my research and I am stuck How can I generate cTAKES without CVD? is there a command or GUI for that? How to generate other format such as html or marked text? ------------------------------------------------------------------------------------ The way I know is : There is misunderstanding for sure here. We feed CVD with text such as "This patient has diabetes and no signs for kidney failure" we also provide the Run > Load AE with the pipeline we are going to use. Once we click on Run AE, cTakes work and analyse the provided text. Then we save the results as .XMI which can be taken to any tool suck as UIMA to display results visually am I right here? Thanks On Tuesday, 17 December 2019, 02:54:03 am AEDT, Finan, Sean <sean.fi...@childrens.harvard.edu> wrote: Hi Akram, Gandhi has provided some good links, and I agree that you should read that information. In case you haven't found it, there is also a "quick start" manual is on this page: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=hEFTcNJENU6jFxURA11zNhC0oFclOI8NHLjYQ25vJkE&s=MMcr9fVavPsBU4gNl4c8RnmwjXA8rqHUrgEy_qw1hXM&e=> Under "Documentation", there is a download named "A pamphlet/manual on cTAKES basics". It was meant to have an accompanying human tutor, but it does contain some handy information. > I can get CVD run on the binary version of cTAKES. -- Excellent! > but I have problem on the Developer version. -- Are you using an IDE? There might be a maven profile listed named "runCVD". You can try to compile ctakes with that profile. From a command line: "mvn compile -PrunCVD" -- Regardless, if you've already built a binary then at least you can run it there. The CVD is not a ctakes product, but is bundled with uima. So if you change ctakes code CVD will still remain the same. -- https://uima.apache.org/d/uimaj-current/tools.html#ugr.tools.cvd<https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimaj-2Dcurrent_tools.html-23ugr.tools.cvd&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=hEFTcNJENU6jFxURA11zNhC0oFclOI8NHLjYQ25vJkE&s=YDgoL0cQiNOAe4aKwCioMuFyFqQ11Cw9eyEAurjTydE&e=> I think that there is a misunderstanding here: >When I try to Load AE on the CVD (Development Version) I get this error -- The CVD is meant to display output from ctakes. -- If you run ctakes to produce an .xmi file(s) then you can load the .xmi file into the CVD and view what ctakes discovered in the document. While the CVD is very good for debugging and roaming details, you can also produce simpler output types such as html and marked text. Other output types might be easier for new users, and they do not require running a second tool (CVD). Sean ________________________________________ From: Akram <as...@yahoo.com.INVALID<mailto:as...@yahoo.com.INVALID>> Sent: Sunday, December 15, 2019 6:35 AM To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> Subject: Re: How does cTAKES work? [EXTERNAL] Warning: Email originated outside Boston Children's. Don't click links/attachments unless you know sender & content seems safe. ********************************************************************** Thanks Gandhi I can get CVD run on the binary version of cTAKES. but I have problem on the Developer version. When I try to Load AE on the CVD (Development Version) I get this error When I try to load : AggregatePlaintextProcessor.xml I get Error : org.apache.uima.resource.ResourceInitializationException: More detailed information in the log file When I try to load : AggregatePlaintextFastUMLSProcessor.xml I get Error : org.apache.uima.resource.ResourceInitializationException: an import could not be resolved. No file with name "org/apache/ctakes/drugner/types/TypeSystem.xml" was found in the class path or data path (Descriptor:file:/D:/cTAKES/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml) More detailed information in the log file P.S. I changed CTAKES_HOME to D:\cTAKES which is the development folder that has all code) How can I fix that? =================================================Error 1: <date>2019-12-15T22:23:11</date> <millis>1576408991483</millis> <sequence>7</sequence> <logger>org.apache.uima</logger> <level>SEVERE</level> <class>org.apache.uima.tools.cvd.MainFrame</class> <method>handleException(527)</method> <thread>24</thread> <message>Exception occurred</message> <exception> <message>org.apache.uima.resource.ResourceInitializationException</message> <frame> <class>org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl</class> <method>load</method> <line>82</line> </frame> =================================================Error 2: <date>2019-12-15T22:32:54</date> <millis>1576409574952</millis> <sequence>9</sequence> <logger>org.apache.uima</logger> <level>SEVERE</level> <class>org.apache.uima.tools.cvd.MainFrame</class> <method>handleException(527)</method> <thread>24</thread> <message>An import could not be resolved. No file with name "org/apache/ctakes/drugner/types/TypeSystem.xml" was found in the class path or data path. (Descriptor: file:/J:/__cTAKES/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)</message> <exception> <message>org.apache.uima.resource.ResourceInitializationException: An import could not be resolved. No file with name "org/apache/ctakes/drugner/types/TypeSystem.xml" was found in the class path or data path. (Descriptor: file:/J:/__cTAKES/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)</message> <frame> <class>org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl</class> <method>initialize</method> <line>165</line> </frame> <frame> <class>org.apache.uima.impl.AnalysisEngineFactory_impl</class> <method>produceResource</method> <line>94</line> </frame> On Sunday, 15 December 2019, 05:46:24 pm AEDT, gandhi rajan <gandhiraja...@gmail.com<mailto:gandhiraja...@gmail.com>> wrote: Hi Akram, I would prefer to have a look at the following link: https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BUser-2BInstall-2BGuide&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=xWsLlr2p_eCQrJQdreWNUdMv0gchC_6EB3QxoTfWeKw&s=qxZURGVbSA4ymygrHulseFwwUMkwDW5dyrRJTsw_1k4&e= And for loading custom dictionaries you gotta look at the following link: https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BDictionaries-2Band-2BModels&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=xWsLlr2p_eCQrJQdreWNUdMv0gchC_6EB3QxoTfWeKw&s=eQMp1QR1LlSy17WiuET0TT87CkBF-TMbH5on4i7CJiU&e= On Sun, Dec 15, 2019 at 9:19 AM Akram <as...@yahoo.com.invalid<mailto:as...@yahoo.com.invalid>> wrote: > Hi > I have 2 questions, and would appreciate the help. > The first question > ================== > I have been trying to get how cTAKES work and not so much luck > I know that we build .piper file through "cTAKES Simple Pipeline > Fabricator" > we get a .piper file > Then the process is not clear > my understanding and I am not sure if I am right here > is that we create a .xml file from the .piper file through > I tried using "cTAKES Pipe File Submitter" > I loaded HelloWorld.piper but I got this error > org.apache.uima.resource.ResourceInitializationException: MESSAGE > LOCALIZATION FAILED: Can't find resource for bundle > java.util.PropertyResourceBundle, key No Analysis Component found for > ContextDependentTokenizerAnnotator > I read Sean's email in the mail archive > and Replaced > add ContextDependentTokenizerAnnotator > with > add org.apache.ctakes.contexttokenizer.ae > .ContextDependentTokenizerAnnotator > but still getting the same error > > The 2nd question > =============== > After creating the .xml the next step will be > Loading the result file in the "CAS Visual Debugger (CVD)" > and build .xmi file > but what is next? > How can I view the the NER values? and where is the coloured screen that > highlight each token in a colour and each NER in a different colour? > and where I train cTAKES to take ICD10 or SNOMED or any other dataset? > > -- Regards, Gandhi "The best way to find urself is to lose urself in the service of others !!!"