Re: [I] [Question]: Initial CTakes analysis [ctakes]

via GitHub Mon, 28 Apr 2025 10:24:06 -0700


Johnsd11 commented on issue #56:
URL: https://github.com/apache/ctakes/issues/56#issuecomment-2835961166


   > I am looking for a NLP to read pathology reports and extract cancer
       > related site, histology, stage and any other DX/RX data available. In
       > looking at CTakes, I have a few questions;
       >
       > - Is CTakes an appropriate tool to automate this task?
   
   I wrote a commercial surgical-pathology coding module some years ago, and
   could imagine doing it in cTAKES.
   Here's my two cents to add to the wealth of information Peter has already
   provided.
   Best luck.
   
   
   
       > Where can I find an "executive overview" (30,000 foot view) of how the
   
   CTakes works?
   
   As Peter said, there's a lot of documentation out there!
   Videos here: https://ctakes.apache.org/tutorials.html
   Key point: it's built on top of UIMA https://uima.apache.org//
   which ingests and annotates data from any source, letting you mix, match
   and create your own annotators to build chains of analyses.
   The cTAKES value-adds include a clinical type system and a spiffy
   dictionary (see below).
   
   
   
       > My ignorance regarding NLP algorithms like CTakes is whether it is
   
   keyword driven, or it is self learning.
   
   cTAKES is *not* "self-learning"; you have to tell it exactly what
   information you want to extract from where.
   
   Pro: High precision; explainable; you won't get the right answer for the
   wrong reason.
   Con: Low recall; brittle; you may not get answers at all! If you're
   processing unpredictable document formats from many different facilities,
   it can be hard to generalize over them.
   
   
   
       > I currently have a homegrown application which looks for keywords and
   
   negation modifiers within a certain distance from the keywords
   
   cTAKES can certainly help with that.
   
      -
   *Keywords *cTAKES lets you use the NLM's UMLS Metathesaurus
      
<https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html>,
      using the dictionary framework Peter mentioned:
   
      
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Fast+Dictionary+Lookup
      These sources may be useful in building your custom dictionary:
         - the NCI Thesaurus:
         
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/NCI/index.html
         - CPT, if you want codes from there:
         
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/CPT/index.html
         -  For anatomy, I'm not familiar with the "anatomical site annotator"
         Peter alludes to, but the FMA is better structured than SNOMED:
   
         
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/FMA/index.html
         - *Negation*
      Several annotators available:
      
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Negation+Annotators
      Distance-from-keywords is a start, but sentence detection and shallow
      parsing both help.
      I like the ctakes-ytex-uima NegexAnnotator and SentenceDetector.
      -
   *Document structure *I found header detection to be crucial in processing
      pathology reports:
      tracking specimens through a document, extracting tumor info from
      tables, etc.
      The cTAKES RegexSectionizer might work for you.
   
      
https://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/ae/RegexSectionizer.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Question]: Initial CTakes analysis [ctakes]

Reply via email to