Arron, I would likewise follow peter's approach but the more flexible and recommended approach would be to use RUTA rule-based language: https://uima.apache.org/ruta.html
Best wishes, Azad On 21 Oct 2016 21:18, "Abramowitsch, Peter" <pabramowit...@hearst.com> wrote: > > While it is doable, it will need some non trivial post processing. The > approach I suggest below is just an example, there are many ways to > achieve this, but there is no silver bullet. > > To do something like that I suggest incorporating a TokensRegex analysis > engine in your pipeline. I have had a lot of success with > https://github.com/JuleStar/uima-tokens-regex > > These allow you to combine standard string based Regex with expressions on > properties of Annotations - a MetaRegex. They allow you to choose the > AnnotationType you prefer to operate with. (Stanford's TokensRegex for > CoreNLP is even more powerful) > > Write TokensRegex rules that look for ConllDep nodes whose text is like > clinic/visit/specialist/referral.. Whatever you are searching for, and > assign a unique tag to that token. Let's say you name the tag CLINIC. > It's a custom NER, basically > > Output your CAS object and start processing here: > > Scan the ConllDep tokens of your document looking for one with the new tag > CLINIC > > If you find one, Now find the sentence boundary around this Token, using > the Sentence Annotations. > > Then use the POS attribute of all the ConllDep tokens within that Sentence > boundary to look for a modifier token(POS=JJ) to the token(POS=NN) that > you tagged > > Now look through the DiseaseDisorderMentions and ProcedureMentions for a > token whose offsets matches the offsets of your JJ ConllDep token. If you > have a hit, then you can use it to find the core SNOMED code for Headache > Clinic, Epilepsy Clinic, Dialysis Clinic etc. Once you have this you > will need to manually add the post coordinations to the SNOMED ref pointed > to by the "(Disease|Procedure)Mention" token. You can elaborate on this > theme to capture more complex cases where the modifier is expressed > differently or is not adjacent to the "CLINIC" token. > > I created a framework in Ruby to post process a CAS in this way, although > I never went as far as generating SNOMED modifiers as they weren't needed > in my case. If not Ruby, use some other language that allows efficient > manipulation of complex data structures in a very few lines of code. > Otherwise it will get ugly fast. > > > On 10/21/16, 3:03 AM, "Finan, Sean" <sean.fi...@childrens.harvard.edu> > wrote: > > >Hi Arron, > > > > > > > >Ctakes discovers text words and phrases by lookup using a subset of the > >UMLS > >https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov_home . > >html&d=DQIGaQ&c=B73tqXN8Ec0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswN > >F8BK5Orm10&m=eJEOUMzoBPBjZxm8a4k4cdGeAH1SrTXyQMdrocZGEiM&s=QambLzUt8R0dB1k > >VhZJzZukV-whlMVbMI82LvtmFkyU&e= ctakes then assigns a code to > >everything that it finds. > > > > > > > >While you can employ various workarounds to remove "epilepsy" in when > >within "epilepsy clinic", these are not part of the standard ctakes > >distribution or workflow. > > > > > > > >Sean > > > > > > > >-----Original Message----- > > > >From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk] > > > >Sent: Thursday, October 20, 2016 6:56 PM > > > >To: dev@ctakes.apache.org > > > >Subject: Post co-ordinated SNOMED-CT with > >AggregatePlaintextFastUMLSProcessor > > > > > > > >Hi, > > > > > > > >Just wondering if someone could point me in the direction of how ctakes > >produces post coordinated SNOMED-CT? Using the > >AggregatePlaintextFastUMLSProcessor the individual concepts come out > >write nicely, however if you take the following phrase "I went to the > >Epilepsy Clinic", I can't see how the final pay coordinated SNOMED > >concepts are formed, and appears I have a list of sub concepts > >(pre-coordinated) that includes the disorder epilepsy (which merely going > >to the clinic would not confirm this. > > > > > > > >Any help would be great thanks - enjoying working with ctakes and hoping > >to include it in an NLP paper on some UK healthcare data. > > > > > > > >Arron Lacey > > > >Research Analyst > > > >SAIL Databank > > > >Swansea Neuroscience Research Group > > > >01792 602023 > > > >a.s.la...@swansea.ac.uk > > > > > > >