[I] [Question]: CUI Question [ctakes]

via GitHub Mon, 28 Apr 2025 10:38:09 -0700


Johnsd11 opened a new issue, #64:
URL: https://github.com/apache/ctakes/issues/64


   ### What’s your question?
   
   Hello,
   I’ve run into a problem and a question when running cTAKES. If I have a 
document and process it through cTAKES, then the XMI output will contain 
numerous XML tags. The tags our lab is interested in are the CUIs, for example, 
the XMI tag
   
   <refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" 
score="0.0" disambiguated="false" cui="C0025859" tui="T109" 
preferredText="Metoprolol-containing product"/>
   
   Would indicate the CUI C0025859 for Metoprolol-containing product is found 
in a given document.
   
   If I look at the input document text, then I can locate three instances of 
the drug Metoprolol in the document text. When I look at the cTAKES XMI output 
in the cTAKES XMI CVD viewer, each of the results for Metoprolol is part of 
ontologyConceptArr, with 4 members each, looking like this:
   
   // found at org.apache.ctakes.typesystem.type.textsem.EventMention
   //       org.apache.ctakes.typesystem.type.textsem.MedicationMention
   //           ontologyConceptArr = uima.cas.FSArray[4]
   
   <refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" 
score="0.0" disambiguated="false" cui="C0025859" tui="T109" 
preferredText="Metoprolol-containing product"/>
   <refsem:UmlsConcept xmi:id="16646" codingScheme="SNOMEDCT_US" code="7092007" 
score="0.0" disambiguated="false" cui="C0025859" tui="T121" 
preferredText="Metoprolol-containing product"/>
   <refsem:UmlsConcept xmi:id="16616" codingScheme="SNOMEDCT_US" 
code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" 
preferredText="Metoprolol-containing product"/>
   <refsem:UmlsConcept xmi:id="16636" codingScheme="SNOMEDCT_US" 
code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" 
preferredText="Metoprolol-containing product"/>
   
   Although not shown here, it is possible for there to be different CUIs 
within a single uima.cas.FSArray, with this array mapping to a single string of 
text in the document.
   
   If I walk the XMI file and retrieve all CUIs, then the result will be the 
CUI C0025859 being found 12 times, however, if I extend the 
JCasAnnotator_ImplBase java class to extract the CUIs from the jCas 
annotations, then it only finds this CUI 3 times.
   
   If part of the output needs to include a count of all CUIs found by cTAKES 
within a given document, which method is correct?
   
   Thanks!
   
   ### Context
   
   _No response_
   
   ### What category does this question fall under?
   
   None
   
   ### Contact Details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Question]: CUI Question [ctakes]

Reply via email to