Johnsd11 opened a new issue, #61:
URL: https://github.com/apache/ctakes/issues/61

   ### What happened?
   
   I have a minor bug to report, and a question that may be a part of a major 
bug.
   
   If I create a custom dictionary with multiple vocabularies and then run 
cTAKES using this custom dictionary, cTAKES will sometimes replace the 
vocabulary name with the name of the custom dictionary. An example is shown in 
the attached image1.png that was run on the MIMIC dataset. I noticed that if I 
looked up the CUI C1548802 in the UMLS Metathesaurus Browser that had the 
incorrect vocabulary name inserted, it had ‘NOCODE’ for the code. This only 
seemed to occur with CUIs from the MTH vocabulary. Is this something that can 
be fixed within cTAKES?
   
   The question and maybe major bug was we ran the same dataset (50 MIMIC 
notes) twice: once on the custom dictionary with multiple vocabularies 
described in the attached image1.png, and then using a custom dictionary that 
only included the snomed vocabulary. Next, we filtered the output from the 
multiple vocabulary dictionary to only include CUIs that were reported by 
snomed. The two outputs from cTAKES should have produced the same CUIs, but as 
can be seen in the attached Venn Diagrams, some of the CUIs reported by cTAKES 
running the snomed-only dictionary were not reported by cTAKES running the 
multiple vocabulary dictionary. Do you know why the two outputs would be 
different?
   
   We’re running user installation of cTAKES 4.0.0.1 via
   
   ./bin/runPiperFile.sh -p path/to/piperfile -l path/to/custom_dict.xml -i 
inputDir --xmiOut outputDir
   
   And then extracting the CUIs from the output XMI files.
   
   Please let me know if I should report this as an issue on the new GitHub 
repository instead of via email.
   
   
   ### Relevant log output
   
   ```shell
   
   ```
   
   ### cTAKES.error.log contents
   
   ```shell
   
   ```
   
   ### Version
   
   5.1.0
   
   ### What operating system are you seeing the problem on?
   
   _No response_
   
   ### Contact Details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to