Hi James, Glad you were able to make cTAKES work for your use case. The UMLS subset that is currently included in the resources should be: * International Classification of Diseases, Ninth Revision, Clinical Modification, 2012 ICD9CM_2012 ICD9CM ENG 0 20997 * International Classification of Diseases, Ninth Revision, Clinical Modification, Metathesaurus additional entry terms, 2012 MTHICD9_2012 ICD9CM ENG 0 16304 * Medical Subject Headings, 2012_2011_09_09 MSH2012_2011_09_09 MSH ENG 0 321367 * NCI Thesaurus, 2011_02D NCI2011_02D NCI ENG 0 90135 * SNOMED Clinical Terms, 2011_07_31 SNOMEDCT_2011_07_31 SNOMEDCT ENG 9 324494
And also RxNorm for the rxnorm_index folder. (I think there was a readme about it, if not, let's at least add it to the User FAQ's?) --Pei > -----Original Message----- > From: Vogel, James [mailto:jvo...@activehealth.net] > Sent: Monday, September 30, 2013 11:41 AM > To: dev@ctakes.apache.org > Subject: RE: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > That worked and I see how I can change the code to do both SNOMED and > ICD9. > I added an index by doing: CREATE INDEX 'umls_ms_2011ab_cui' ON > umls_ms_2011ab (cui); I needed to change the database from 'read-only', is > that going to cause any other problems? > > What subset of ICD9 is in the dictionary? > > From: Pei Chen [mailto:chen...@apache.org] > Sent: Friday, September 27, 2013 11:26 PM > To: dev@ctakes.apache.org > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > James, > Obviously it would be best to customize the code and/or the dictionary for > your particular case. > But if you want to try something that will work without any code changes, > you can try the below in your LookupDesc_Db.xml Essentially, what it will do > is take advantage of the fact the the UmlsToSnomedDbConsumerImpl will > allow you to specify an SQL statement that maps the CUI's to Codes. Couple > by the fact that there already is a table called umls_ms_2011ab which > contains the codes and cui's from many different sources including ICD9CM. > What you could do is just reuse the table as the mapping table as well and > specify the source such as: > select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM' > > (The downside is that I don't think there is a index on sourcetype so > performance may suck). > I've attached an example to normalize to ICD9CM codes instead of > SNOMEDCT. > <lookupConsumer > className="org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbCons > umerImpl"> > <properties> > <property key="codingScheme" value="ICD9CM"/> <property > key="cuiMetaField" value="cui"/> <property key="tuiMetaField" > value="tui"/> <property key="anatomicalSiteTuis" > value="T021,T022,T023,T024,T025,T026,T029,T030"/> > <property key="procedureTuis" value="T059,T060,T061"/> <property > key="disorderTuis" > value="T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/> > <property key="findingTuis" > value="T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184"/> > <property key="dbConnExtResrcKey" value="DbConnection"/> <property > key="mapPrepStmt" value="select code from umls_ms_2011ab where cui=? > and sourcetype='ICD9CM'"/> </properties> </lookupConsumer> > > On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen > <chen...@apache.org<mailto:chen...@apache.org>> wrote: > James, > One can try the NamedEntityLookupConsumerImpl instead of > UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only > contain SNOMED codes. > Will you need to preserve the TUI? One thing is that > NamedEntityLookupConsumerImpl will return back all of the hits, except that > it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts. Perhaps > we should make the NamedEntityLookupConsumerImpl a bit more general. > > --Pei > > On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James > <jvo...@activehealth.net<mailto:jvo...@activehealth.net>> wrote: > I now see that I use a query on umls_ms_2011ab where sourcetype = > 'ICD9CM'. Is there a way to use an existing AE or class to add additional > ICD9CM annotations / concepts or do I change the code in consumeHits() or > getSnomedCodes()? > > -----Original Message----- > From: Vogel, James > Sent: Friday, September 27, 2013 6:30 PM > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > Subject: RE: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > Is anyone able to provide any more detailed guidance on what I'd need to > change to add the ICD9 codes as tags, e.g., where do I look for the tables in > the hsql database that would contain the ICD9 data? > > Thanks. > > -----Original Message----- > From: Miller, Timothy > [mailto:timothy.mil...@childrens.harvard.edu<mailto:Timothy.Miller@childr > ens.harvard.edu>] > Sent: Monday, September 16, 2013 7:25 AM > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > Subject: Re: specificity in selecting EntityMentions when using > AggregatePlaintextUMLSProcessor > > James, > I haven't done it myself, so I don't know exactly how the config changes, but > I know roughly where to look. In the LookupDesc_Db.xml, the > <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under the > <lookupConsumer> section, and you'll see the codingScheme is SNOMED. > I believe this is where the actual dictionary filtering is done. There is > also a > consumer class called > org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl > and a mapPrepStmt field with a SQL query that might need changing. That is > where I would start looking, I'm not sure whether you would need to write a > new consumer class, and what values the codingScheme field can take, but > hopefully this helps you get started until someone else chimes in with more > detailed info! > > Tim > > On 09/15/2013 08:39 PM, Vogel, James wrote: > > Any more guidance you can give about the nature of the changes to the > config and impl that would need to be made to get the ICD9 codes? > > > > -----Original Message----- > > From: Pei Chen > [mailto:chen...@apache.org<mailto:chen...@apache.org>] > > Sent: Wednesday, September 04, 2013 1:02 PM > > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > > Subject: Re: specificity in selecting EntityMentions when using > > AggregatePlaintextUMLSProcessor > > > > Ted, > > > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > > familiar> with how to access that information: In the example I've > > described below, > > > >> where would I locate the ICD9 for a specific entity? > > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is > > configured[1] only returns/stores concepts [2] that have a SNOMEDCT > > code or RxNorm code. > > > > [1] > > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup- > > > res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_ > > Db.xml > > > > [2] > > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/ > > > src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedCon > su > > merImpl.java > > > > If you would like it to return ICD9 codes, one would need to > > modify/configure the above... > > > > --Pei > > > > > > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted > > > <theodore.as...@providence.org<mailto:theodore.as...@providence.org > >>wrote: > > > >> Thanks for looking into this, it's been puzzling me. > >> > >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not > >> familiar with how to access that information: In the example I've > >> described below, where would I locate the ICD9 for a specific entity? > >> > >> Thank you > >> > >> Ted > >> > >> -----Original Message----- > >> From: Pei Chen > [mailto:chen...@apache.org<mailto:chen...@apache.org>] > >> Sent: Tuesday, September 03, 2013 7:13 PM > >> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org> > >> Subject: Re: specificity in selecting EntityMentions when using > >> AggregatePlaintextUMLSProcessor > >> > >> You're right, it should have gotten "CIN I"- that's a strange one, > >> probably needs to be debugged/looked into further... > >> > >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy < > >> > timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.har > vard.edu>> wrote: > >>> Ah. So it will get > >>> CIN 2 (in SNOMED) > >>> CIN III (in SNOMED) > >>> CIN 3 (in SNOMED) > >>> > >>> but the rest are not in SNOMED? > >>> > >>> I wonder why it doesn't get CIN I? It looks like that exists in > >>> SNOMED (though I don't fully understand what all the symbols mean in > >>> the umls browser). > >>> > >>>> CIN I - Cervical intraepithelial neoplasia 1 > >>>> [A3002690/SNOMEDCT/SY/285836003] > >>> > >>> On 09/03/2013 09:55 PM, Pei Chen wrote: > >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some > >>>> of the terms do not exist in SNOMED- CIN 2 - Cervical > >>>> intraepithelial neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists > but not CIN II. > >>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it > >>>> was able to perform the lookup successfully. > >>>> Note that CIN II synonyms do exist in other umls thersauses such as > >>>> MEDCIN, CCPSS though. However, the bundled cTAKES dictionaries > >>>> only contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC. > >>>> > >>>> --Pei > >>>> > >>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy > >>>> > <timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.ha > rvard.edu>> wrote: > >>>>> That is a good question, Ted! > >>>>> > >>>>> I tried it with a simple context: "The patient has a CIN III." I'm > >>>>> not sure if that is a correct context but I was able to duplicate > >>>>> your findings. (Finds a CUI for CIN III but not if you change it > >>>>> to CIN II) > >>>>> > >>>>> My first thought was that it is the chunker. But the chunker seems > >>>>> to get it right, as CIN II and CIN III are both called NPs, and > >>>>> similarly the LookupWindowAnnotator handles them both identically. > >>>>> So that suggests it is a problem with the actual lookup of the > >>>>> tokens in the LookupWindow. > >>>>> > >>>>> That's all I can do for now but maybe someone else who knows more > >>>>> about its behavior offhand will have an idea. > >>>>> > >>>>> Tim > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote: > >>>>>> I'm trying to understand what would prevent the > >> AggregatePlaintextUMLSProcessor AE from correctly parsing specific > >> problems that are defined in the UMLS version used by cTAKES. > >>>>>> For example, > >>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage is > >> parsed out as UMLS CUI C0206708. > >>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported > >>>>>> with > >> Roman Numerals, I,II, and III. > >>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS CUI > >> C0851140: "Carcinoma in situ of uterine cervix." > >>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or CIN > >>>>>> II > >> as their correct concepts, "Cervical intraepithelial neoplasia grade > >> 1" and "Cervical intraepithelial neoplasia grade 2" respectively. > >>>>>> Is there a way to tune the detection of UMLS concepts? > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -------------------------------------------- > >>>>>> Ted Assur > >>>>>> IT Solutions Architect for Cancer Research Providence Health & > >>>>>> Services > >>>>>> ted.as...@providence.org<mailto:ted.as...@providence.org> > >>>>>> 503-215-6476<tel:503-215-6476> > >>>>>> > >>>>>> Crede, ut intelligas. > >>>>>> Intellego, ut credam. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> ________________________________ > >>>>>> > >>>>>> This message is intended for the sole use of the addressee, and > >>>>>> may > >> contain information that is privileged, confidential and exempt from > >> disclosure under applicable law. If you are not the addressee you are > >> hereby notified that you may not use, copy, disclose, or distribute > >> to anyone the message or any information contained in the message. If > >> you have received this message in error, please immediately advise > >> the sender by reply email and delete this message. > >> > >> ________________________________ > >> > >> This message is intended for the sole use of the addressee, and may > >> contain information that is privileged, confidential and exempt from > >> disclosure under applicable law. If you are not the addressee you are > >> hereby notified that you may not use, copy, disclose, or distribute > >> to anyone the message or any information contained in the message. If > >> you have received this message in error, please immediately advise > >> the sender by reply email and delete this message. > >> > >> > > IMPORTANT WARNING: Information contained in this email is intended for > the use of the individual to whom it is addressed, and may contain > information that is privileged, confidential, and exempt from disclosure > under applicable law. If you are not the intended recipient, or the employee > or agent responsible for delivering the message to the intended recipient, > you are hereby notified that any dissemination, distribution, or copying of > this communication is STRICTLY FORBIDDEN. If you have received this > communication in error, please notify us immediately by return email and > delete this document. Thank you. > > > > > IMPORTANT WARNING: Information contained in this email is intended for > the use of the individual to whom it is addressed, and may contain > information that is privileged, confidential, and exempt from disclosure > under applicable law. If you are not the intended recipient, or the employee > or agent responsible for delivering the message to the intended recipient, > you are hereby notified that any dissemination, distribution, or copying of > this communication is STRICTLY FORBIDDEN. If you have received this > communication in error, please notify us immediately by return email and > delete this document. Thank you. > > > > ________________________________ > IMPORTANT WARNING: Information contained in this email is intended for > the use of the individual to whom it is addressed, and may contain > information that is privileged, confidential, and exempt from disclosure > under applicable law. If you are not the intended recipient, or the employee > or agent responsible for delivering the message to the intended recipient, > you are hereby notified that any dissemination, distribution, or copying of > this communication is STRICTLY FORBIDDEN. If you have received this > communication in error, please notify us immediately by return email and > delete this document. Thank you.