Arron,

I would likewise follow peter's approach but the more flexible and
recommended approach would be to use RUTA rule-based language:
https://uima.apache.org/ruta.html

Best wishes,
Azad

On 21 Oct 2016 21:18, "Abramowitsch, Peter" <pabramowit...@hearst.com>
wrote:
>
> While it is doable, it will need some non trivial post processing. The
> approach I suggest below is just an example, there are many ways to
> achieve this, but there is no silver bullet.
>
> To do something like that I suggest incorporating a TokensRegex analysis
> engine in your pipeline.  I have had a lot of success with
> https://github.com/JuleStar/uima-tokens-regex
>
> These allow you to combine standard string based Regex with expressions on
> properties of Annotations - a MetaRegex.  They allow you to choose the
> AnnotationType you prefer to operate with.  (Stanford's TokensRegex for
> CoreNLP is even more powerful)
>
> Write TokensRegex rules that look for ConllDep nodes whose text is like
> clinic/visit/specialist/referral.. Whatever you are searching for, and
> assign a unique tag to that token.  Let's say you name the tag CLINIC.
> It's a custom NER, basically
>
> Output your CAS object and start processing here:
>
> Scan the ConllDep tokens of your document looking for one with the new tag
> CLINIC
>
> If you find one, Now find the sentence boundary around this Token, using
> the Sentence Annotations.
>
> Then use the POS attribute of all the ConllDep tokens within that Sentence
> boundary to look for a modifier token(POS=JJ) to the token(POS=NN) that
> you tagged
>
> Now look through the DiseaseDisorderMentions and ProcedureMentions for a
> token whose offsets matches the offsets of your JJ ConllDep token.  If you
> have a hit, then you can use it to find the core SNOMED code for Headache
> Clinic, Epilepsy Clinic, Dialysis Clinic etc.   Once you have this you
> will need to manually add the post coordinations to the SNOMED ref pointed
> to by the "(Disease|Procedure)Mention" token.  You can elaborate on this
> theme to capture more complex cases where the modifier is expressed
> differently or is not adjacent to the "CLINIC" token.
>
> I created a framework in Ruby to post process a CAS in this way, although
> I never went as far as generating SNOMED modifiers as they weren't needed
> in my case.  If not Ruby, use some other language that allows efficient
> manipulation of complex data structures in a very few lines of code.
> Otherwise it will get ugly fast.
>
>
> On 10/21/16, 3:03 AM, "Finan, Sean" <sean.fi...@childrens.harvard.edu>
> wrote:
>
> >Hi Arron,
> >
> >
> >
> >Ctakes discovers text words and phrases by lookup using a subset of the
> >UMLS
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov_home
.
>
>html&d=DQIGaQ&c=B73tqXN8Ec0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswN
>
>F8BK5Orm10&m=eJEOUMzoBPBjZxm8a4k4cdGeAH1SrTXyQMdrocZGEiM&s=QambLzUt8R0dB1k
> >VhZJzZukV-whlMVbMI82LvtmFkyU&e=     ctakes then assigns a code to
> >everything that it finds.
> >
> >
> >
> >While you can employ various workarounds to remove "epilepsy" in when
> >within "epilepsy clinic", these are not part of the standard ctakes
> >distribution or workflow.
> >
> >
> >
> >Sean
> >
> >
> >
> >-----Original Message-----
> >
> >From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk]
> >
> >Sent: Thursday, October 20, 2016 6:56 PM
> >
> >To: dev@ctakes.apache.org
> >
> >Subject: Post co-ordinated SNOMED-CT with
> >AggregatePlaintextFastUMLSProcessor
> >
> >
> >
> >Hi,
> >
> >
> >
> >Just wondering if someone could point me in the direction of how ctakes
> >produces post coordinated SNOMED-CT? Using the
> >AggregatePlaintextFastUMLSProcessor the individual concepts come out
> >write nicely, however if you take the following phrase "I went to the
> >Epilepsy Clinic", I can't see how the final pay coordinated SNOMED
> >concepts are formed, and appears I have a list of sub concepts
> >(pre-coordinated) that includes the disorder epilepsy (which merely going
> >to the clinic would not confirm this.
> >
> >
> >
> >Any help would be great thanks - enjoying working with ctakes and hoping
> >to include it in an NLP paper on some UK healthcare data.
> >
> >
> >
> >Arron Lacey
> >
> >Research Analyst
> >
> >SAIL Databank
> >
> >Swansea Neuroscience Research Group
> >
> >01792 602023
> >
> >a.s.la...@swansea.ac.uk
> >
> >
> >
>

Reply via email to