That is very informative - Thanks Chen! -----Original Message----- From: Lin, Chen [mailto:chen....@childrens.harvard.edu] Sent: Wednesday, September 20, 2017 3:37 PM To: dev@ctakes.apache.org Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]
Hi Gandhi, As for the error in EventTimeRelationAnnotator, the reason is that the time-class attribute value for an temporal expression mention is missing. When we develop this annotator, we used time-class in the gold annotation as a feature to help the classifier. If this feature is missing, the system can still predict event-time relation, but the performance will drop a little. Our test on SemEval 2015 data shows if the temporal attributes are missing, the system performance will drop 0.012 in F-score (Table 4 of https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC5009920_&d=DwIFAw&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=oQQvhPN8wZ_LuvLpAO3D_2-LZpC-Tv6WuPa91xNS-gw&s=JcBwFJ_L-dVY7Ncal1XDHE-7awOU7sA5_N2X1ij_ggI&e= ). If you really want this time-class feature, please add ³BackwardTimeAnnotator² into your processing pipeline, which will annotate temporal expressions and predict their time classes. Please keep in mind that this annotator is not 100% accurate either. Best, Chen On 9/20/17, 2:43 PM, "Gandhi Rajan Natarajan" <gandhi.natara...@arisglobal.com> wrote: >Hi James & Sean, Thanks for your support. > > > >Regarding point-1, We don¹t have any database or metadata to get the >name or sex information. Is it not possible to achieve in cTAKES by any >other names? If yes, what other approach will be feasible to implement >this along with cTAKES as we need this info very much for our requirement. > > > >Regarding point-2, I will have a check on what you have suggested. But >dates analysis is not part of temporal module? Do you mean to say that >if we use drug ner module, ContextDependentTokenizerAnnotator will be >overwritten for date identifications? Also while using piper GUI to run >the analysis, we could see the following message in the console: > >21 Sep 2017 00:08:04 INFO EventTimeRelationAnnotator - Starting >processing ... > >Null value found in Feature(<Time-Class->, <NULL>) > > > >Could someone brief on this error and how to overcome it? > > > > > >Regards, > >Gandhi > > > > > >-----Original Message----- > >From: James Masanz [mailto:masanz.ja...@gmail.com] > >Sent: Wednesday, September 20, 2017 8:41 PM > >To: dev@ctakes.apache.org > >Subject: Re: Enabling drugner pipeline and identifying dates [EXTERNAL] > > > >1) I would typically not use cTAKES for extracting patient names or sex. >is there any database or metadata that you can get that information from? > > > >2) Dates are found by the ContextDependentTokenizerAnnotator, which uses >DateFSM.java in package org.apache.ctakes.core.fsm.machine. > >I believe drug ner uses DateParser in org.apache.ctakes.core.util to >interpret the date annotations. So you might need to modify both DateFSM >and DateParser. > > > > > > > >On Tue, Sep 19, 2017 at 11:20 AM, Gandhi Rajan Natarajan < >gandhi.natara...@arisglobal.com> wrote: > > > >> Hi Sean, > >> > >> Thanks again for the detailed and prompt response. We were able to run > >> the piper GUI as per your advice. But in the output (The patient > >> started study treatment of Thalomid 200mg ( days 1 - 21 ) , and > >> Epirubicin ,20 mg / m2 ( days 1 , 8 , and 15 ) on 06 / 07 / 02 for the > >> treatment of hepatocellular carcinoma.), we were not able to find > >> superscript-1 as you mentioned earlier but could find superscript-2, 3 > >> etc. We guess we are missing out on something as we could not find > >> co-references for "200mg". Should we add anymore piper for this? > >> > >> Also the change mentioned in the thread - >>https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apach >>e&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=PZ241CwYZ3Asza >>TEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=dcOOtQZqb8EmJvtHt6ZTmNCVTatQDcVv8Pta43hSd >>0s&s=xElCOx2UASgWtuWUmL3KouME2Jivc5P_7UaHxzdROBw&e= . > >> org/mod_mbox/ctakes-user/201403.mbox/%3CCAL6WimrJ_mm1+ > >> xyggbzv62diyuwp0sca9vev8mnhgwe4hsn...@mail.gmail.com%3E is required > >> for the drug-ner module to identify drug-ner annotations. > >> > >> 1) We also have a requirement to identify the patient names and sex > >> available in narrative texts. Please let us know how to achieve the > >> same as its not identifying the proper nouns and the relationship with >>the patient? > >> Eg. "This male patient named Tom Hardy aged 35 years is participating > >> in a Non-IND study" > >> > >> 2) cTAKES is unable to identify the dates like 20Aug02 or 20/Aug/02 or > >> 06 / 07 / 02 or 27Aug2002 as in the below example. Please let us know > >> how to enhance the system to identify such date patterns. > >> E.g " On 20Aug02, the investigator noted that this patient was > >> suffering worsening fatigue and got tired getting out of his chair" > >> > >> Regards, > >> Gandhi > >> > >> > >> -----Original Message----- > >> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > >> Sent: Monday, September 18, 2017 10:02 PM > >> To: dev@ctakes.apache.org > >> Subject: RE: Enabling drugner pipeline and identifying dates > >> [EXTERNAL] > >> > >> Hi Gandhi, > >> > >> > So in this case will be able to see drug attributes in the output XML? > >> As long as you have the DrugMentionAnnotator in your pipeline you > >> should be able to find drug attributes in the xml output file. > >> > >> > we also saw some code changes needs to be done to use drug-ner module. > >> Is it still valid? > >> As far as I know there aren't any necessary code changes to get drug > >> ner running. However, I do not normally use drugner so I can't say for >>certain. > >> > >> > Also you mentioned that the drun-ner module is out of date > >> It can still be used and will produce annotations. All that I meant > >> was that there may not be many people out there using it. It is not > >> part of the default pipeline. > >> > >> > You also mentioned that when you run the sentence, the date was > >> identified. Where and how exactly did you ran it so that we can check > >> the same? > >> I run the following in a piper file because I am interested in a lot > >> of modules (I added drugner just for you): > >> > >> // Advanced Tokenization: Regex sectionization, BIO Sentence Detector > >> (lumper), Paragraphs, Lists load AdvancedTokenizerPipeline.piper add > >> ContextDependentTokenizerAnnotator > >> add POSTagger > >> // Chunkers > >> load ChunkerSubPipe.piper > >> // Default fast dictionary lookup > >> load DictionarySubPipe.piper > >> add org.apache.ctakes.drugner.ae.DrugMentionAnnotator > >> // Cleartk Entity Attributes > >> load AttributeCleartkSubPipe.piper > >> // Relations > >> load RelationSubPipe.piper > >> // Temporal > >> load TemporalSubPipe.piper > >> // Coreferences > >> load CorefSubPipe.piper > >> // Html output > >> add pretty.html.HtmlTextWriter > >> > >> For information on piper files, see >>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_&d= >>DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=PZ241CwYZ3AszaTEBt >>M2wl3EcIjNNNeKX8q7N_mt-aI&m=dcOOtQZqb8EmJvtHt6ZTmNCVTatQDcVv8Pta43hSd0s&s >>=GOQ2qY5OViwrRXswo5Fz_ysNvrKzo4_Vgj192tJPF2E&e= > >> confluence/display/CTAKES/Piper+Files > >> I run it in my IDE with: > >> org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G -p > >> <FileAsAbove>.piper -i org/apache/ctakes/examples/notes -o <OutputDir> > >> --user <MyUmlsUser> --pass <MyUmlsPass> You can run it by command line > >> by substituting "org.apache.ctakes.core.pipeline.PiperFileRunner -Xmx3G" > >> with "bin/runPiperFile". > >> You can also run it through a ctakes 4.01 (trunk) gui. See > >> >>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_con >>fluence_display_CTAKES_&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCop >>pxeFU&r=PZ241CwYZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI&m=dcOOtQZqb8EmJvtHt6Z >>TmNCVTatQDcVv8Pta43hSd0s&s=2gsdLMqxN4oe3fTQ8OMeY_s4jiOeKZeRiXsz5sEoXOo&e= >> > >> Piper+File+Submitter+GUI > >> > >> > I'm not able to see any clickable option in HTML output > >> You must have the HtmlTextWriter at the end of your pipeline to > >> produce html files. To keep the xml file output, place "add > >> FileTreeXmiWriter" at the end of the piper. > >> > >> > Apologizes for too many > >> No worries, we are happy to have your interest! > >> > >> Sean > >> > >> > >> -----Original Message----- > >> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] > >> Sent: Saturday, September 16, 2017 7:01 AM > >> To: dev@ctakes.apache.org > >> Subject: RE: Enabling drugner pipeline and identifying dates > >> [EXTERNAL] > >> > >> Hi Sean, > >> > >> Thanks again for the prompt response. Appreciate your input on adding > >> DrugMentionAnnotator. Actually, we are relying on pretty printer > >> output just to understand the analysis. Our logic to extract disorders > >> and findings are based on the XML file generated by > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github. > >> com_healthnlp_examples_blob_master_ctakes-2Dtemporal- > >> 2Ddemo_src_main_java_org_apache_ctakes_web_client_ > >> servlet_DemoServlet.java&d=DwIFAg&c=qS4goWBT7poplM69zy_ > >> 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao > >> &m=_ MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI&s= > >> g8UzBHRoOyn1hoRABKSC6EtPMvwOSSggviRmWCHKti4&e= So in this case will be > >> able to see drug attributes in the output XML? > >> > >> In one of the old post > >> (https://urldefense.proofpoint.com/v2/url?u=http- > >> 3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Duser_ > >> 201403.mbox_-253CCAL6WimrJ-5Fmm1-2BXyggBZv62diYuWP0ScA9VEV8mNHG > >> We4hSNHQg-40mail.gmail.com-253E&d=DwIFAg&c=qS4goWBT7poplM69zy_ > >> 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao > >> &m=_ MJKBj93YJdd5aa84dBvqtg6o-BKBn7UcbfF660CEBI&s=iT_ > >> 1UGR98APO80UaZsaCBHseMqF4M4PfItgokD27r5c&e= ) we also saw some code > >> changes needs to be done to use drug-ner module. Is it still valid? > >> Also you mentioned that the drun-ner module is out of date which means > >> it cannot be used or it may not provide accurate analysis? Also what > >> changes needs to be done to bring it up to date so that we can try the > >> same if you can assist? > >> > >> You also mentioned that when you run the sentence, the date was > >> identified. Where and how exactly did you ran it so that we can check > >> the same? Also regarding you explanation on corefernce, I'm not able > >> to see any clickable option in HTML output. So wanted to understand > >> how can we run and check that too. > >> > >> Apologizes for too many questions as we are just a week old in NLP and > >> cTAKES. Thanks in advance. > >> > >> Regards, > >> Gandhi > >> > >> This email and any files transmitted with it are confidential and > >> intended solely for the use of the individual or entity to whom they >>are addressed. > >> If you are not the named addressee you should not disseminate, > >> distribute or copy this e-mail. Please notify the sender or system > >> manager by email immediately if you have received this e-mail by > >> mistake and delete this e-mail from your system. If you are not the > >> intended recipient you are notified that disclosing, copying, > >> distributing or taking any action in reliance on the contents of this > >> information is strictly prohibited and against the law. > >> This email and any files transmitted with it are confidential and > >> intended solely for the use of the individual or entity to whom they >>are addressed. > >> If you are not the named addressee you should not disseminate, > >> distribute or copy this e-mail. Please notify the sender or system > >> manager by email immediately if you have received this e-mail by > >> mistake and delete this e-mail from your system. If you are not the > >> intended recipient you are notified that disclosing, copying, > >> distributing or taking any action in reliance on the contents of this > >> information is strictly prohibited and against the law. > >> > >This email and any files transmitted with it are confidential and >intended solely for the use of the individual or entity to whom they are >addressed. If you are not the named addressee you should not disseminate, >distribute or copy this e-mail. Please notify the sender or system >manager by email immediately if you have received this e-mail by mistake >and delete this e-mail from your system. If you are not the intended >recipient you are notified that disclosing, copying, distributing or >taking any action in reliance on the contents of this information is >strictly prohibited and against the law. >