Hi Sean, Thanks again for the detailed response.
I still couldn't manage to get superscript-1 co-reference in piper GUI. Also I'm not able to use "BackwardsTimeAnnotator" in piper GUI as it gives me the below error: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator" failed. (Descriptor: <unknown>) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170) Caused by: java.lang.IllegalArgumentException: Please specify PARAM_IS_TRAINING - unable to infer it from context at org.cleartk.ml.CleartkAnnotator.initialize(CleartkAnnotator.java:109) Somewhere in old mails it's mentioned that it's because of missing dependencies so I tried adding ClearTkAnnotator with no luck yet. My piper file is as follows: load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger load ChunkerSubPipe.piper load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator load AttributeCleartkSubPipe.piper load RelationSubPipe.piper load TemporalSubPipe.piper load CorefSubPipe.piper add org.apache.ctakes.temporal.ae.BackwardsTimeAnnotator add pretty.html.HtmlTextWriter add FileTreeXmiWriter Any suggestion on this? Also I'm using all the latest 4.0.1 cTAKES Jars. Regarding the identification of Names, will dig deep on what you have mentioned. Sorry to ask this as you already mentioned that there are no detailed docs for cTAKES. But is there any doc or guide on how to start writing our own annotator if required? It not, Is there any simple annotator that you would suggest us to look into to get better understanding on annotators for us to proceed further. Thanks in advance. Regards, Gandhi -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Thursday, September 21, 2017 7:59 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, > We guess we are missing out on something as we could not find co-references > for "200mg". Should we add anymore piper for this? The piper commands that I sent has everything to obtain coreferences. I use it regularly - it is what I used on your example sentence to get the coreferences that I mentioned. > Also the change mentioned in the thread ... That is a very old thread and I don't think that it applies to what you are trying to do. > We also have a requirement to identify the patient names and sex As James said, ctakes isn't really meant to do this. Ctakes is catered toward extracting clinical data, and to this point names have not fallen into that category. It is more a task for general nlp. There is an opennlp model that can identify names and a few others (I used to see names using GATE). ctakes has wrapped opennlp for other tasks and you should be able to do the same to adapt an engine for names into ctakes. > cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / > 02 or 27Aug2002 As Chen mentioned, the BackwardTimeAnnotator module uses an ML model trained on gold data. It isn't perfect. You can add another time annotator on top of this to get some of the more simply formatted date mentions - there are a lot of them out there. Personally I have used jchronic as it can be easily tweaked to recognize medically-relevant temporal expressions relating to surgery, pharmacology, etc. Sean -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, September 20, 2017 8:50 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] Hi Gandhi, I don't have time to go through all of this right now, but I will try to get to it soon. Make sure that you are running the latest version in trunk. Sean -----Original Message----- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Wednesday, September 20, 2017 7:03 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi, Could someone help me out on the below queries please? Regards, Gandhi -----Original Message----- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Tuesday, September 19, 2017 8:51 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the detailed and prompt response. We were able to run the piper GUI as per your advice. But in the output (The patient started study treatment of Thalomid 200mg ( days 1 - 21 ) , and Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the treatment of hepatocellular carcinoma.), we were not able to find superscript-1 as you mentioned earlier but could find superscript-2, 3 etc. We guess we are missing out on something as we could not find co-references for "200mg". Should we add anymore piper for this? Also the change mentioned in the thread - https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=GzhvIkBu4cgyzYN9n6VLe2rz4sJhJzMxDcWyB0BkqAc&e= is required for the drug-ner module to identify drug-ner annotations. 1) We also have a requirement to identify the patient names and sex available in narrative texts. Please let us know how to achieve the same as its not identifying the proper nouns and the relationship with the patient? Eg. "This male patient named Tom Hardy aged 35 years is participating in a Non-IND study" 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know how to enhance the system to identify such date patterns. E.g " On 20Aug02, the investigator noted that this patient was suffering worsening fatigue and got tired getting out of his chair" Regards, Gandhi -----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, September 18, 2017 10:02 PM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Gandhi, > So in this case will be able to see drug attributes in the output XML? As long as you have the DrugMentionAnnotator in your pipeline you should be able to find drug attributes in the xml output file. > we also saw some code changes needs to be done to use drug-ner module. Is it > still valid? As far as I know there aren't any necessary code changes to get drug ner running. However, I do not normally use drugner so I can't say for certain. > Also you mentioned that the drun-ner module is out of date It can still be used and will produce annotations. All that I meant was that there may not be many people out there using it. It is not part of the default pipeline. > You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? I run the following in a piper file because I am interested in a lot of modules (I added drugner just for you): // Advanced Tokenization: Regex sectionization, BIO Sentence Detector (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add ContextDependentTokenizerAnnotator add POSTagger // Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup load DictionarySubPipe.piper add org.apache.ctakes.drugner.ae.DrugMentionAnnotator // Cleartk Entity Attributes load AttributeCleartkSubPipe.piper // Relations load RelationSubPipe.piper // Temporal load TemporalSubPipe.piper // Coreferences load CorefSubPipe.piper // Html output add pretty.html.HtmlTextWriter For information on piper files, see https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFiles&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=9ueuHYwEywok8byBXEkVjmTWiChmaIY3ryB4Pi6ajRo&e= I run it in my IDE with: org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p <FileAsAbove>.piper -i org/apache/ctakes/examples/notes -o <OutputDir> --user <MyUmlsUser> --pass <MyUmlsPass> You can run it by command line by substituting "org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" with "bin/runPiperFile". You can also run it through a ctakes 4.01 (trunk) gui. See https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Piper-2BFile-2BSubmitter-2BGUI&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=JoUDRZHu91gGMslwknPzTQC_UG2LEBLyOfXR3ikwOL0&s=VWIrXrfA2dZ8KHOdoizJo-nTx7nPSy4GDOZ7IxQteIQ&e= > I'm not able to see any clickable option in HTML output You must have the HtmlTextWriter at the end of your pipeline to produce html files. To keep the xml file output, place "add FileTreeXmiWriter" at the end of the piper. > Apologizes for too many No worries, we are happy to have your interest! Sean -----Original Message----- From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] Sent: Saturday, September 16, 2017 7:01 AM To: dev@ctakes.apache.org Subject: RE: Enabling drugner pipeline and identifying dates [EXTERNAL] Hi Sean, Thanks again for the prompt response. Appreciate your input on adding DrugMentionAnnotator. Actually, we are relying on pretty printer output just to understand the analysis. Our logic to extract disorders and findings are based on the XML file generated by https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_healthnlp_examples_blob_master_ctakes-2Dtemporal-2Ddemo_src_main_java_org_apache_ctakes_web_client_servlet_DemoServlet.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI&s=g8UzBHRoOyn1hoRABKSC6EtPMvwOSSggviRmWCHKti4&e= So in this case will be able to see drug attributes in the output XML? In one of the old post (https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHGWe4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=_MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI&s=iT_1UGR98APO80UaZsaCBHseMqF4M4PfItgokD27r5c&e= ) we also saw some code changes needs to be done to use drug-ner module. Is it still valid? Also you mentioned that the drun-ner module is out of date which means it cannot be used or it may not provide accurate analysis? Also what changes needs to be done to bring it up to date so that we can try the same if you can assist? You also mentioned that when you run the sentence, the date was identified. Where and how exactly did you ran it so that we can check the same? Also regarding you explanation on corefernce, I'm not able to see any clickable option in HTML output. So wanted to understand how can we run and check that too. Apologizes for too many questions as we are just a week old in NLP and cTAKES. Thanks in advance. Regards, Gandhi This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender or system manager by email immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and against the law.