OK I think I figured this out. If I use the *full file path*, and not a classpath resource path, etc., it seems to work.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: jpluser <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Wednesday, October 7, 2015 at 8:34 PM To: "[email protected]" <[email protected]> Subject: Re: How to update cTAKES so that new top level categories come out based on local dictionary? >Hi Sean, > >One more question too: > >So, I put the bsv files in the resources directory as part of my >Apache cTAKES 3.2.2 distribution: > >/usr/local/apache-ctakes-3.2.2-bin/resources > >underneath: >org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv > >and I referenced it like this (as an example just including the dictionary >def, path is same for the concept factory): > <dictionary> > <name>CustomCuiRareWord</name> > ><implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRar >e >WordDictionary</implementationName> > <properties> > <property key="bsvPath" >value="resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file >. >bsv"/> > </properties> > </dictionary> > > >Here’s what I see in the logs: > ><snip> >7 Oct 2015 20:31:01 INFO AbstractJCasTermAnnotator - Exclusion tagset >loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN >VBP VBZ WDT WP WPS WRB >07 Oct 2015 20:31:01 INFO AbstractJCasTermAnnotator - Using minimum term >text span: 3 >07 Oct 2015 20:31:01 INFO DictionaryDescriptorParser - Parsing dictionary >specifications: >/data/hosts/web-dev.aws-redda.celgene.com/local/cdeploy/shangridocs/shangr >i >docs-tika/ctakes/apache-ctakes-3.2.2/resources/org/apache/ctakes/dictionar >y >/lookup/fast/cTakesHsql.xml >07 Oct 2015 20:31:01 INFO UmlsUserApprover - Checking UMLS Account at >https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user chrismattmann: >.. >07 Oct 2015 20:31:02 INFO UmlsUserApprover - UMLS Account at >https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user chrismattmann >has been validated >07 Oct 2015 20:31:02 INFO JdbcConnectionFactory - Connecting to >jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakes >s >norx/ctakessnorx: >...... >07 Oct 2015 20:31:04 INFO JdbcConnectionFactory - Database connected >07 Oct 2015 20:31:04 ERROR BsvRareWordDictionary - >resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv >(No such file or directory) >07 Oct 2015 20:31:04 ERROR BsvConceptFactory - >resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv >(No such file or directory) ></snip> > > >I’ve tried all variants, e.g., in the cTakesHsql.xml file I see resources >as a prefix for the >hsqldb file, so I tried that too, and it doesn’t work. I’ve also tried it >without resources as a prefix, >that doesn’t work too. > >Any ideas? > >Cheers, >Chris > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Chief Architect >Instrument Software and Science Data Systems Section (398) >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >Office: 168-519, Mailstop: 168-527 >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Adjunct Associate Professor, Computer Science Department >University of Southern California, Los Angeles, CA 90089 USA >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > >-----Original Message----- >From: "Finan, Sean" <[email protected]> >Reply-To: "[email protected]" <[email protected]> >Date: Tuesday, October 6, 2015 at 2:04 PM >To: "[email protected]" <[email protected]> >Subject: RE: How to update cTAKES so that new top level categories come >out based on local dictionary? > >>Hi Chris, >> >>I use bsv to denote "bar separated value" - also known as "pipe >>delimited". I typically name the files with a ".bsv" extension, and they >>are just plain old boring ascii flat files. >>There should be multiple columns in the bsv file separated by the '|' >>character. The following are all valid per-line formats: >>CUI|text >>CUI|TUI|text >>CUI|TUI|text|preferredText >>It doesn't matter which format you choose, the parser will auto-detect >>per-line. Starting a line with "//" or "#" indicates that it is a >>comment and should be ignored. >> >> >>To add the bsv dictionary to your pipeline you just need to edit the >>resources/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml file >>and add a couple new sections. >>Within the <dictionaries> section, add: >> <dictionary> >> <name>CustomCuiRareWord</name> >> >><implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRa >>r >>eWordDictionary</implementationName> >> <properties> >> <property key="bsvPath" >>value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/ >>> >> </properties> >> </dictionary> >>Within the <conceptFactories> section, add: >> <conceptFactory> >> <name>CustomCuiConcept</name> >> >><implementationName>org.apache.ctakes.dictionary.lookup2.concept.BsvConce >>p >>tFactory</implementationName> >> <properties> >> <property key="bsvPath" >>value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/ >>> >> </properties> >> </conceptFactory> >>Within the <dictionaryConceptPairs> section, add: >> <dictionaryConceptPair> >> <name>CustomPair</name> >> <dictionaryName>CustomCuiRareWord</dictionaryName> >> <conceptFactoryName>CustomCuiConcept</conceptFactoryName> >> </dictionaryConceptPair> >>You can change all of the [Custom**] names, and you should obviously >>point to the actual path of your bsv file. >> >>In addition to detecting your column count/style, upon loading the text >>will be lower-cased and tokenized and the terms will be indexed by rare >>word (for fast lookup). Also, you do not need to write out the whole >>"C1234567" or "T123" cui tui codes. The default prefix characters and >>padding zeros are automatically added. Cuis "1" "01" "C1" "C01" will >>all be stored as "C0000001" and Tuis are handled likewise. If you have >>custom cuis then it will honor non-"C" prefixes and still pad zeros >>automatically based upon the longest entry. For instance, if your bsv >>has "CAM1", "CAM12" and "CAM12345" then the stored custom cuis should be >>"CAM00001", "CAM00012" and "CAM13245". >> >>I think that is about all that there is to it ... >> >>Sean >> >>-----Original Message----- >>From: Mattmann, Chris A (3980) [mailto:[email protected]] >>Sent: Tuesday, October 06, 2015 4:31 PM >>To: [email protected] >>Subject: Re: How to update cTAKES so that new top level categories come >>out based on local dictionary? >> >>Hi Sean, >> >> >> >>Thanks so much for your reply. For now I don’t care about the secondary >> >>codes and I for sure have < 1000 terms. Can you tell me how to wire up >> >>the BSV file by editing specific places in cTAKES? What specific commands >> >>should I run or what format should the BSV file look like? I must admit >> >>I have never heard of BSV files and the Internet varies on these between >> >>Bluespec System Verilog and BASIC bsave files. >> >> >> >>Then after I make the BSV file, what steps next? Recompile cTAKES? Can >> >>I take the BSV file and simply point to it from a binary installation of >> >>cTAKES? Thank you! >> >> >> >>Cheers, >> >>Chris >> >> >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>Chris Mattmann, Ph.D. >> >>Chief Architect >> >>Instrument Software and Science Data Systems Section (398) >> >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>Office: 168-519, Mailstop: 168-527 >> >>Email: [email protected] >> >>WWW: >>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emat >>t >>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZs >>t >>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bLdoNVceobXShsqfGFdPDKSiq2WNSUbGDHdvmr >>f >>Mj10&s=CXhGiFUuPnSekOe4GnsuxPOgYHbNp-hAnOD8jmB-lgc&e= >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>Adjunct Associate Professor, Computer Science Department >> >>University of Southern California, Los Angeles, CA 90089 USA >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> >> >> >> >> >>-----Original Message----- >> >>From: "Finan, Sean" <[email protected]> >> >>Reply-To: "[email protected]" <[email protected]> >> >>Date: Tuesday, October 6, 2015 at 8:05 AM >> >>To: "[email protected]" <[email protected]> >> >>Subject: RE: How to update cTAKES so that new top level categories come >> >>out based on local dictionary? >> >> >> >>>Hi Chris, >> >>> >> >>>There are a few ways to do this: >> >>>1. Create an additional dictionary with the terms of interest and add >>>it >> >>>as a source >> >>>2. Create a new dictionary hsqldb that contains everything, old and new >> >>>3. Add to the existing hsqldb dictionary >> >>> >> >>>The best approach for you would probably depend upon >> >>>1. How many new terms you have >> >>>2. Whether or not you desire additional codes, i.e. rxnorm, snomed >> >>> >> >>>If you don't have many new terms (<1000) and you don't care about >> >>>secondary codes then the easiest thing would be to create a BSV file >>>with >> >>>the new terms and cuis. >> >>> >> >>>If you have a lot of new terms or do care about secondary codes, then a >> >>>less facile solution would be to create a new hsqldb with only the new >> >>>info or a complete replacement with new and old/existing terms. Of the >> >>>two hsql options creating a new all-inclusive database would probably be >> >>>easier unless you want to learn the ins and outs of hsql. If all of the >> >>>terms are in the umls, then the new all-inclusive hsqldb would >>>definitely >> >>>be easiest (I think) as you could use the dictionary tool to create it. >> >>> >> >>>If you let me know your exact situation then I may be able to better >> >>>expound. >> >>> >> >>>Sean >> >>> >> >>>-----Original Message----- >> >>>From: Mattmann, Chris A (3980) [mailto:[email protected]] >> >>>Sent: Monday, October 05, 2015 7:36 PM >> >>>To: [email protected] >> >>>Subject: How to update cTAKES so that new top level categories come out >> >>>based on local dictionary? >> >>> >> >>>Hi cTAKES team, >> >>> >> >>> >> >>> >> >>>Hope you’re well! I had a quick question. I was wondering if someone >> >>> >> >>>could provide me a step-by-step guide to updating cTAKES to be based >> >>> >> >>>off a local dictionary, so that in addition to e.g., >> >>> >> >>> >> >>> >> >>>ProceduralMention >> >>> >> >>> Value1 position etc >> >>> >> >>> Value2 position etc >> >>> >> >>> >> >>> >> >>>MedicationMention >> >>> >> >>> Value1 position etc >> >>> >> >>> Value2 position etc >> >>> >> >>> >> >>> >> >>> >> >>> >> >>>NewTopLevelCategoryFromMyDictionary >> >>> >> >>> FoundValue1 position etc >> >>> >> >>> FoundValue2 position etc >> >>> >> >>> >> >>> >> >>> >> >>> >> >>>I realize this has something to do with updating the annotation >> >>> >> >>>descriptions etc in XML, so if I someone just could tell me what >> >>> >> >>>to update I’d really appreciate it. >> >>> >> >>> >> >>> >> >>>Thank you! >> >>> >> >>> >> >>> >> >>>Cheers, >> >>> >> >>>Chris >> >>> >> >>> >> >>> >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >>>Chris Mattmann, Ph.D. >> >>> >> >>>Chief Architect >> >>> >> >>>Instrument Software and Science Data Systems Section (398) >> >>> >> >>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>> >> >>>Office: 168-519, Mailstop: 168-527 >> >>> >> >>>Email: [email protected] >> >>> >> >>>WWW: >> >>>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Ema >>>t >>>t >> >>>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ >>>s >>>t >> >>>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=MEZE0aOE5pBHul1QA3A9xWbiwS6LzZaIq2rMw >>>9 >>>a >> >>>jiB0&s=cvi79MY1__guvBRsQmsYfc39lqPvv-1Yx1Pg8g5B0QU&e= >> >>> >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >>>Adjunct Associate Professor, Computer Science Department >> >>> >> >>>University of Southern California, Los Angeles, CA 90089 USA >> >>> >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >> >> >
