James, I haven't done it myself, so I don't know exactly how the config changes, but I know roughly where to look. In the LookupDesc_Db.xml, the <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under the <lookupConsumer> section, and you'll see the codingScheme is SNOMED. I believe this is where the actual dictionary filtering is done. There is also a consumer class called org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a mapPrepStmt field with a SQL query that might need changing. That is where I would start looking, I'm not sure whether you would need to write a new consumer class, and what values the codingScheme field can take, but hopefully this helps you get started until someone else chimes in with more detailed info!
Tim On 09/15/2013 08:39 PM, Vogel, James wrote: > Any more guidance you can give about the nature of the changes to the config > and impl that would need to be made to get the ICD9 codes? > > -----Original Message----- > From: Pei Chen [mailto:chen...@apache.org] > Sent: Wednesday, September 04, 2013 1:02 PM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > Ted, > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > familiar> with how to access that information: In the example I've > described below, > >> where would I locate the ICD9 for a specific entity? > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is > configured[1] only returns/stores concepts [2] that have a SNOMEDCT code or > RxNorm code. > > [1] > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml > > [2] > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java > > If you would like it to return ICD9 codes, one would need to > modify/configure the above... > > --Pei > > > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted > <theodore.as...@providence.org>wrote: > >> Thanks for looking into this, it's been puzzling me. >> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not >> familiar with how to access that information: In the example I've described >> below, where would I locate the ICD9 for a specific entity? >> >> Thank you >> >> Ted >> >> -----Original Message----- >> From: Pei Chen [mailto:chen...@apache.org] >> Sent: Tuesday, September 03, 2013 7:13 PM >> To: dev@ctakes.apache.org >> Subject: Re: specificity in selecting EntityMentions when using >> AggregatePlaintextUMLSProcessor >> >> You're right, it should have gotten "CIN I"- that's a strange one, >> probably needs to be debugged/looked into further... >> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy < >> timothy.mil...@childrens.harvard.edu> wrote: >>> Ah. So it will get >>> CIN 2 (in SNOMED) >>> CIN III (in SNOMED) >>> CIN 3 (in SNOMED) >>> >>> but the rest are not in SNOMED? >>> >>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED >>> (though I don't fully understand what all the symbols mean in the umls >>> browser). >>> >>>> CIN I - Cervical intraepithelial neoplasia 1 >>>> [A3002690/SNOMEDCT/SY/285836003] >>> >>> On 09/03/2013 09:55 PM, Pei Chen wrote: >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some of >>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial >>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN II. >>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it was >>>> able to perform the lookup successfully. >>>> Note that CIN II synonyms do exist in other umls thersauses such as >>>> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries only >>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. >>>> >>>> --Pei >>>> >>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy >>>> <timothy.mil...@childrens.harvard.edu> wrote: >>>>> That is a good question, Ted! >>>>> >>>>> I tried it with a simple context: "The patient has a CIN III." I'm >>>>> not sure if that is a correct context but I was able to duplicate >>>>> your findings. (Finds a CUI for CIN III but not if you change it to >>>>> CIN II) >>>>> >>>>> My first thought was that it is the chunker. But the chunker seems >>>>> to get it right, as CIN II and CIN III are both called NPs, and >>>>> similarly the LookupWindowAnnotator handles them both identically. >>>>> So that suggests it is a problem with the actual lookup of the >>>>> tokens in the LookupWindow. >>>>> >>>>> That's all I can do for now but maybe someone else who knows more >>>>> about its behavior offhand will have an idea. >>>>> >>>>> Tim >>>>> >>>>> >>>>> >>>>> >>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote: >>>>>> I'm trying to understand what would prevent the >> AggregatePlaintextUMLSProcessor AE from correctly parsing specific problems >> that are defined in the UMLS version used by cTAKES. >>>>>> For example, >>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is >> parsed out as UMLS CUI C0206708. >>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported with >> Roman Numerals, I,II, and III. >>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI >> C0851140: "Carcinoma in situ of uterine cervix." >>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN II >> as their correct concepts, "Cervical intraepithelial neoplasia grade 1" and >> "Cervical intraepithelial neoplasia grade 2" respectively. >>>>>> Is there a way to tune the detection of UMLS concepts? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -------------------------------------------- >>>>>> Ted Assur >>>>>> IT Solutions Architect for Cancer Research Providence Health & >>>>>> Services ted.as...@providence.org >>>>>> 503-215-6476 >>>>>> >>>>>> Crede, ut intelligas. >>>>>> Intellego, ut credam. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> >>>>>> This message is intended for the sole use of the addressee, and may >> contain information that is privileged, confidential and exempt from >> disclosure under applicable law. If you are not the addressee you are >> hereby notified that you may not use, copy, disclose, or distribute to >> anyone the message or any information contained in the message. If you have >> received this message in error, please immediately advise the sender by >> reply email and delete this message. >> >> ________________________________ >> >> This message is intended for the sole use of the addressee, and may >> contain information that is privileged, confidential and exempt from >> disclosure under applicable law. If you are not the addressee you are >> hereby notified that you may not use, copy, disclose, or distribute to >> anyone the message or any information contained in the message. If you have >> received this message in error, please immediately advise the sender by >> reply email and delete this message. >> >> > IMPORTANT WARNING: Information contained in this email is intended for the > use of the individual to whom it is addressed, and may contain information > that is privileged, confidential, and exempt from disclosure under applicable > law. If you are not the intended recipient, or the employee or agent > responsible for delivering the message to the intended recipient, you are > hereby notified that any dissemination, distribution, or copying of this > communication is STRICTLY FORBIDDEN. If you have received this communication > in error, please notify us immediately by return email and delete this > document. Thank you. >