Thanks Finan and Brandon, your help is appreciated a lot. I downloaded the dictionary tool from https://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool/bin/dictionarytool.zip I hope its the latest and bug free.
my running command is : java -cp ./dictionarytool.jar:lib/* org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls /home/abarari/Desktop/umls/2015AB/META/ -atui ./data/optional/CtakesAnatTuis.txt -db jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd2015 -tbl CUI_TERMS -df ./data/optional/ -src ./data/small/ConversionSources.txt -tui ./data/optional/CtakesAllTuis.txt I am running on ubuntu by the way ... anyway under /home/abarari/Desktop/dictionarytool/output/ there is only abarari@ubuntu:~/Desktop/dictionarytool/output$ ls ctakesicd2015.log ctakesicd2015.properties ctakesicd2015.script where is the database ? am I doing something wrong ? do I need to create the database before executing the dictionarytool or what ? I found couple of issues in the dictionary tool, it does not work well with relative paths. On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <[email protected]> wrote: > Brandon, > That sounds great! > Please open a Jira ticket for any contributions (anyone should be able > to create a Jira account). There are some legal items built into the > ASF Jira attachments for accepting contributions/donations. > It will also credit the contributors with the merit appropriately. > Anyone who is interested can follow the Jira item. (Even better if > contributions were open discussion/open development.) > --Pei > > On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D. > <[email protected]> wrote: > > I'd be interested in contributing to making the dictionary tool more > user friendly with a GUI. > > > > Thanks, > > Brandon > > > > -----Original Message----- > > From: Finan, Sean [mailto:[email protected]] > > Sent: Tuesday, December 08, 2015 6:12 PM > > To: [email protected] > > Subject: RE: ctakes with icd10; 2015 versions available on sourceforge! > > > > Hi Dave, > > > > I'm always happy to see interest in our stuff! > > > >>Step 1 > > I built the tool to be able to build a dictionary using anything in the > umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't > be a problem. You just add it to the CtakesSources file (or create an > alternate file and point to it with -src). To answer another of your > questions, there can be zero or more sources - you saw snomedct and > snomedct_us (each valid in a different umls version). > > It also can include any semantic type, just add (or remove) the > appropriate tuis in a different data file. > > > >>Step 2 > > You have it right - you copy the templates to another location and > output to that location. Otherwise you 'lose' your templates. > > > >>Step 3 and 4 > > The jar is built from source. I need to (soon) check in updates to the > source, and at the same time I can check in a default prebuilt .jar The > lib/ directory is in the source repository. > > > > Various people have toyed with the idea of putting the tool into a > ctakes module, putting it into an "installation package", making a gui ... > The best option (imo) is probably to make an easy to use gui and keep a > pre-built version in sandbox. Someday, after the rainbow, maybe I'll get a > chance to do that ... > > > > Sean > > > > > > -----Original Message----- > > From: David Kincaid [mailto:[email protected]] > > Sent: Tuesday, December 08, 2015 4:57 PM > > To: [email protected] > > Subject: Re: ctakes with icd10; 2015 versions available on sourceforge! > > > > Thanks, Sean! It's great that cTAKES may soon have an up to date > database out of the box. Hopefully it will cut down on the need for many to > build their own DB's. Thank you much for doing that. > > > > Unfortunately, I still will need to build a custom one for us. I work in > veterinary medicine so I need to add in the veterinary extension for > SNOMED-CT into the database. > > > > I looked over the steps below that Brandon included and have some > questions: > > > > step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" > to "SNOMEDCT_US". The file that I have has two lines in it. First line is > SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense. > > > > step 2 should reference the two scripts as being in > resource/memdbtemplate so others don't have to search for them. Not sure > what it means to move them to "location to put new UMLS DB". Does that mean > move them into a new directory where the newly created UMLS DB will get > written? > > > > steps 3 and 4 for running the tools reference dictionarytool.jar which > doesn't exist. Does one need to build that somehow from the source before > running it? The command line also adds "lib/*" to the classpath. Is that > the lib directory inside the dictionarytool source code or some other > location? > > > > What else would I need to do to include the SNOMED-CT Veterinary > Extension along with the snomedct and rxnorm sources? > > > > I'll probably not have time to try this out for a while yet, but when I > do I'd be happy to write up an easy to follow tutorial for building a > custom dictionary assuming I am able to get it to work. > > > > Has anyone considered making this tool available outside of the source > code itself? Like including it in the main cTAKES release? It seems there > is demand for it. > > > > - Dave > > > > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < > [email protected]> wrote: > > > >> Hi Brandon, thanks for finding and forwarding the instructions! > >> > >> I have checked in two new hsqldb dictionaries, both from the 2015AB > >> version of the UMLS. They both have codes for snomedct_us, rxnorm, > >> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term > mappings. > >> > >> One uses cuis filtered by snomed and rxnorm, the other adds cuis > >> filtered by icd9 and icd10. > >> What this means: Cuis that exist for a [filter source] are added to > >> the dictionary, as are all text variations from all sources that > >> contain that cui. Both dictionaries also use the standard ctakes > >> semantic group tui filters. > >> > >> The names are ctakessnorx2015 and ctakesicd2015 > >> > >> The snomed rxnorm : > >> > >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_ > >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo > >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l > >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM > >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm > >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR > >> oS1Gav7r2A&e= > >> > >> The snomed rxnorm icd9 icd10: > >> > >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_ > >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo > >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l > >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd > >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU > >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw > >> w7EdYgKA&e= > >> > >> The svn root for the whole ugly thing is: > >> svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk > >> > >> Stats: > >> ctakessnorx2015 > >> 545,913 Terms > >> 229,251 Concepts (Cuis) > >> 272,987 Snomed codes > >> 32,419 Rxnorm codes > >> 11,321 icd9 codes > >> 61 icd10 codes > >> > >> Ctakesicd2015 > >> 611,230 Terms > >> 282,211 Concepts > >> 18,626 icd9 codes > >> 45,818 icd10 codes > >> Snomed and Rxnorm counts are the same > >> > >> So, adding the icd filters gave us an extra ~53,000 concepts and > >> ~65,000 terms. > >> > >> I would like to move this all to a better root (not > >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to > >> write directly in trunk (??) and need to get moving on to other things. > >> > >> There is help on the ctakes wiki: > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_ > >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo > >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ > >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53 > >> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e= > >> Though I should probably add a few items ... > >> > >> > >> Sean > >> > >> > >> -----Original Message----- > >> From: Geise, Brandon D. [mailto:[email protected]] > >> Sent: Tuesday, December 08, 2015 12:51 PM > >> To: [email protected] > >> Subject: RE: ctakes with icd10 > >> > >> Not to perpetuate the instructions again but I sent these out not long > >> ago when I was going through the process and Sean was helping me. > >> > >> 1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to > >> "SNOMEDCT_US" > >> 2. Copy ctakesumls.properties and ctakesumls.script from > >> memdbtemplate to location to put new UMLS DB > >> 3. Run DictionaryCreator2 > >> java -cp dictionarytool.jar;lib/* > >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls > >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db > >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS > >> 4. Run CodeMapCreator > >> java -cp dictionarytool.jar;lib/* > >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META" > >> -atui ./data/tiny/CtakesAnatTuis.txt -db > >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS > >> 5. Copy new DB files to new location and create a copy of > >> cTakesHsql.xml and update dictionary location > >> > >> Thanks, > >> Brandon > >> > >> -----Original Message----- > >> From: David Kincaid [mailto:[email protected]] > >> Sent: Tuesday, December 08, 2015 12:47 PM > >> To: [email protected] > >> Subject: Re: ctakes with icd10 > >> > >> This seems like a pretty common request and with such an old version > >> of UMLS database shipped with cTAKES it's only going to get worse. > >> I've been wanting to build a dictionary using the latest UMLS release > >> (as well as a custom database), so would be happy to write up the > >> steps as I go through it. That assumes that I can dig up the > instructions in the dev list. > >> > >> - Dave > >> > >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < > >> [email protected]> wrote: > >> > >> > Hi Alaa, > >> > > >> > The -shortest- answer is that you'll need to run the dictionary > >> > creation tool. There are instructions in older devlist threads. By > >> > default the dictionary creation tool does add icd9 and icd10 tables > >> > to > >> the dictionary. > >> > The problem is that in Umls 2011AB those codes weren't very well > >> > populated. The 2015AB icd# set is much more rich so those tables > >> > should be pretty good. Then in ctakes you would look up annotations > >> > by icd9 or icd10 codes instead of by cui: > >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, > >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code > >> > ); > >> > > >> > Sean > >> > > >> > -----Original Message----- > >> > From: Savova, Guergana > >> > [mailto:[email protected]] > >> > Sent: Tuesday, December 08, 2015 12:17 PM > >> > To: [email protected] > >> > Subject: RE: ctakes with icd10 > >> > > >> > Hi Alaa, > >> > You need to create a resource off the terminology/ontology you want > >> > to use (in this case ICD9 or ICD10). Then run that resource with > >> > cTAKES for the fast dictionary lookup. There is cTAKES code and some > >> > documentation on how to create that resource. By default, cTAKES > >> > runs with a resource created from the English version of SNOMED CT > and RxNORM. > >> > Hope this helps. > >> > --Guergana > >> > > >> > -----Original Message----- > >> > From: Alaa al Barari [mailto:[email protected]] > >> > Sent: Tuesday, December 8, 2015 10:01 AM > >> > To: [email protected] > >> > Subject: ctakes with icd10 > >> > > >> > Hi, > >> > > >> > I downloaded Latest umls version, and I want to know how to make > >> > ctakes work with icd10 and icd9. > >> > > >> > > >> > Thanks > >> > > >> > >> > >> IMPORTANT WARNING: The information in this message (and the documents > >> attached to it, if any) is confidential and may be legally privileged. > >> It is intended solely for the addressee. Access to this message by > >> anyone else is unauthorized. If you are not the intended recipient, > >> any disclosure, copying, distribution or any action taken, or omitted > >> to be taken, in reliance on it is prohibited and may be unlawful. If > >> you have received this message in error, please delete all electronic > >> copies of this message (and the documents attached to it, if any), > >> destroy any hard copies you may have created and notify me immediately > by replying to this email. Thank you. > >> > >> Geisinger Health System utilizes an encryption process to safeguard > >> Protected Health Information and other confidential data contained in > >> external e-mail messages. If email is encrypted, the recipient will > >> receive an e-mail instructing them to sign on to the Geisinger Health > >> System Secure E-mail Message Center to retrieve the encrypted e-mail. > >> > -- Eng Alaa Al-Barari phone 0599297470
