Re: How to update cTAKES so that new top level categories come out based on local dictionary?

Mattmann, Chris A (3980) Wed, 07 Oct 2015 21:46:59 -0700

OK I think I figured this out. If I use the *full file path*, and
not a classpath resource path, etc., it seems to work.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: jpluser <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, October 7, 2015 at 8:34 PM
To: "[email protected]" <[email protected]>
Subject: Re: How to update cTAKES so that new top level categories come
out based on local dictionary?

>Hi Sean,
>
>One more question too:
>
>So, I put the bsv files in the resources directory as part of my
>Apache cTAKES 3.2.2 distribution:
>
>/usr/local/apache-ctakes-3.2.2-bin/resources
>
>underneath:
>org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv
>
>and I referenced it like this (as an example just including the dictionary
>def, path is same for the concept factory):
>      <dictionary>
>         <name>CustomCuiRareWord</name>
>         
><implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRar
>e
>WordDictionary</implementationName>
>         <properties>
>            <property key="bsvPath"
>value="resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file
>.
>bsv"/>
>         </properties>
>      </dictionary>
>
>
>Here’s what I see in the logs:
>
><snip>
>7 Oct 2015 20:31:01  INFO AbstractJCasTermAnnotator - Exclusion tagset
>loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
>VBP VBZ WDT WP WPS WRB
>07 Oct 2015 20:31:01  INFO AbstractJCasTermAnnotator - Using minimum term
>text span: 3
>07 Oct 2015 20:31:01  INFO DictionaryDescriptorParser - Parsing dictionary
>specifications: 
>/data/hosts/web-dev.aws-redda.celgene.com/local/cdeploy/shangridocs/shangr
>i
>docs-tika/ctakes/apache-ctakes-3.2.2/resources/org/apache/ctakes/dictionar
>y
>/lookup/fast/cTakesHsql.xml
>07 Oct 2015 20:31:01  INFO UmlsUserApprover - Checking UMLS Account at
>https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user chrismattmann:
>..
>07 Oct 2015 20:31:02  INFO UmlsUserApprover -   UMLS Account at
>https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user chrismattmann
>has been validated
>07 Oct 2015 20:31:02  INFO JdbcConnectionFactory - Connecting to
>jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakes
>s
>norx/ctakessnorx:
>......
>07 Oct 2015 20:31:04  INFO JdbcConnectionFactory -  Database connected
>07 Oct 2015 20:31:04 ERROR BsvRareWordDictionary -
>resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv
>(No such file or directory)
>07 Oct 2015 20:31:04 ERROR BsvConceptFactory -
>resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv
>(No such file or directory)
></snip>
>
>
>I’ve tried all variants, e.g., in the cTakesHsql.xml file I see resources
>as a prefix for the
>hsqldb file, so I tried that too, and it doesn’t work. I’ve also tried it
>without resources as a prefix,
>that doesn’t work too.
>
>Any ideas?
>
>Cheers,
>Chris
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>-----Original Message-----
>From: "Finan, Sean" <[email protected]>
>Reply-To: "[email protected]" <[email protected]>
>Date: Tuesday, October 6, 2015 at 2:04 PM
>To: "[email protected]" <[email protected]>
>Subject: RE: How to update cTAKES so that new top level categories come
>out based on local dictionary?
>
>>Hi Chris,
>>
>>I use bsv to denote "bar separated value" - also known as "pipe
>>delimited".  I typically name the files with a ".bsv" extension, and they
>>are just plain old boring ascii flat files.
>>There should be multiple columns in the bsv file separated by the '|'
>>character.  The following are all valid per-line formats:
>>CUI|text
>>CUI|TUI|text
>>CUI|TUI|text|preferredText
>>It doesn't matter which format you choose, the parser will auto-detect
>>per-line.  Starting a line with "//" or "#" indicates that it is a
>>comment and should be ignored.
>>
>>
>>To add the bsv dictionary to your pipeline you just need to edit the
>>resources/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml file
>>and add a couple new sections.
>>Within the <dictionaries> section, add:
>>      <dictionary>
>>         <name>CustomCuiRareWord</name>
>>         
>><implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRa
>>r
>>eWordDictionary</implementationName>
>>         <properties>
>>            <property key="bsvPath"
>>value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/
>>>
>>         </properties>
>>      </dictionary>
>>Within the <conceptFactories> section, add:
>>      <conceptFactory>
>>         <name>CustomCuiConcept</name>
>>         
>><implementationName>org.apache.ctakes.dictionary.lookup2.concept.BsvConce
>>p
>>tFactory</implementationName>
>>         <properties>
>>            <property key="bsvPath"
>>value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/
>>>
>>         </properties>
>>      </conceptFactory>
>>Within the <dictionaryConceptPairs> section, add:
>>      <dictionaryConceptPair>
>>         <name>CustomPair</name>
>>         <dictionaryName>CustomCuiRareWord</dictionaryName>
>>         <conceptFactoryName>CustomCuiConcept</conceptFactoryName>
>>      </dictionaryConceptPair>
>>You can change all of the [Custom**] names, and you should obviously
>>point to the actual path of your bsv file.
>>
>>In addition to detecting your column count/style, upon loading the text
>>will be lower-cased and tokenized and the terms will be indexed by rare
>>word (for fast lookup).   Also, you do not need to write out the whole
>>"C1234567" or "T123" cui tui codes.  The default prefix characters and
>>padding zeros are automatically added.   Cuis "1" "01" "C1" "C01" will
>>all be stored as "C0000001" and Tuis are handled likewise.  If you have
>>custom cuis then it will honor non-"C" prefixes and still pad zeros
>>automatically based upon the longest entry.  For instance, if your bsv
>>has "CAM1", "CAM12" and "CAM12345" then the stored custom cuis should be
>>"CAM00001", "CAM00012" and "CAM13245".
>>
>>I think that is about all that there is to it ...
>>
>>Sean
>>
>>-----Original Message-----
>>From: Mattmann, Chris A (3980) [mailto:[email protected]]
>>Sent: Tuesday, October 06, 2015 4:31 PM
>>To: [email protected]
>>Subject: Re: How to update cTAKES so that new top level categories come
>>out based on local dictionary?
>>
>>Hi Sean,
>>
>>
>>
>>Thanks so much for your reply. For now I don’t care about the secondary
>>
>>codes and I for sure have < 1000 terms. Can you tell me how to wire up
>>
>>the BSV file by editing specific places in cTAKES? What specific commands
>>
>>should I run or what format should the BSV file look like? I must admit
>>
>>I have never heard of BSV files and the Internet varies on these between
>>
>>Bluespec System Verilog and BASIC bsave files.
>>
>>
>>
>>Then after I make the BSV file, what steps next? Recompile cTAKES? Can
>>
>>I take the BSV file and simply point to it from a binary installation of
>>
>>cTAKES? Thank you!
>>
>>
>>
>>Cheers,
>>
>>Chris
>>
>>
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>Chris Mattmann, Ph.D.
>>
>>Chief Architect
>>
>>Instrument Software and Science Data Systems Section (398)
>>
>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>
>>Office: 168-519, Mailstop: 168-527
>>
>>Email: [email protected]
>>
>>WWW:  
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emat
>>t
>>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZs
>>t
>>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bLdoNVceobXShsqfGFdPDKSiq2WNSUbGDHdvmr
>>f
>>Mj10&s=CXhGiFUuPnSekOe4GnsuxPOgYHbNp-hAnOD8jmB-lgc&e=
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>Adjunct Associate Professor, Computer Science Department
>>
>>University of Southern California, Los Angeles, CA 90089 USA
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>-----Original Message-----
>>
>>From: "Finan, Sean" <[email protected]>
>>
>>Reply-To: "[email protected]" <[email protected]>
>>
>>Date: Tuesday, October 6, 2015 at 8:05 AM
>>
>>To: "[email protected]" <[email protected]>
>>
>>Subject: RE: How to update cTAKES so that new top level categories come
>>
>>out based on local dictionary?
>>
>>
>>
>>>Hi Chris,
>>
>>>
>>
>>>There are a few ways to do this:
>>
>>>1.  Create an additional dictionary with the terms of interest and add
>>>it
>>
>>>as a source
>>
>>>2.  Create a new dictionary hsqldb that contains everything, old and new
>>
>>>3.  Add to the existing hsqldb dictionary
>>
>>>
>>
>>>The best approach for you would probably depend upon
>>
>>>1.  How many new terms you have
>>
>>>2.  Whether or not you desire additional codes, i.e. rxnorm, snomed
>>
>>>
>>
>>>If you don't have many new terms (<1000) and you don't care about
>>
>>>secondary codes then the easiest thing would be to create a BSV file
>>>with
>>
>>>the new terms and cuis.
>>
>>>
>>
>>>If you have a lot of new terms or do care about secondary codes, then a
>>
>>>less facile solution would be to create a new hsqldb with only the new
>>
>>>info or a complete replacement with new and old/existing terms.  Of the
>>
>>>two hsql options creating a new all-inclusive database would probably be
>>
>>>easier unless you want to learn the ins and outs of hsql.  If all of the
>>
>>>terms are in the umls, then the new all-inclusive hsqldb would
>>>definitely
>>
>>>be easiest (I think) as you could use the dictionary tool to create it.
>>
>>>
>>
>>>If you let me know your exact situation then I may be able to better
>>
>>>expound.
>>
>>>
>>
>>>Sean
>>
>>>
>>
>>>-----Original Message-----
>>
>>>From: Mattmann, Chris A (3980) [mailto:[email protected]]
>>
>>>Sent: Monday, October 05, 2015 7:36 PM
>>
>>>To: [email protected]
>>
>>>Subject: How to update cTAKES so that new top level categories come out
>>
>>>based on local dictionary?
>>
>>>
>>
>>>Hi cTAKES team,
>>
>>>
>>
>>>
>>
>>>
>>
>>>Hope you’re well! I had a quick question. I was wondering if someone
>>
>>>
>>
>>>could provide me a step-by-step guide to updating cTAKES to be based
>>
>>>
>>
>>>off a local dictionary, so that in addition to e.g.,
>>
>>>
>>
>>>
>>
>>>
>>
>>>ProceduralMention
>>
>>>
>>
>>>  Value1 position etc
>>
>>>
>>
>>>  Value2 position etc
>>
>>>
>>
>>>
>>
>>>
>>
>>>MedicationMention
>>
>>>
>>
>>>  Value1 position etc
>>
>>>
>>
>>>  Value2 position etc
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>NewTopLevelCategoryFromMyDictionary
>>
>>>
>>
>>>  FoundValue1 position etc
>>
>>>
>>
>>>  FoundValue2 position etc
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>I realize this has something to do with updating the annotation
>>
>>>
>>
>>>descriptions etc in XML, so if I someone just could tell me what
>>
>>>
>>
>>>to update I’d really appreciate it.
>>
>>>
>>
>>>
>>
>>>
>>
>>>Thank you!
>>
>>>
>>
>>>
>>
>>>
>>
>>>Cheers,
>>
>>>
>>
>>>Chris
>>
>>>
>>
>>>
>>
>>>
>>
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>>
>>
>>>Chris Mattmann, Ph.D.
>>
>>>
>>
>>>Chief Architect
>>
>>>
>>
>>>Instrument Software and Science Data Systems Section (398)
>>
>>>
>>
>>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>
>>>
>>
>>>Office: 168-519, Mailstop: 168-527
>>
>>>
>>
>>>Email: [email protected]
>>
>>>
>>
>>>WWW:  
>>
>>>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Ema
>>>t
>>>t
>>
>>>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
>>>s
>>>t
>>
>>>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=MEZE0aOE5pBHul1QA3A9xWbiwS6LzZaIq2rMw
>>>9
>>>a
>>
>>>jiB0&s=cvi79MY1__guvBRsQmsYfc39lqPvv-1Yx1Pg8g5B0QU&e=
>>
>>>
>>
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>>
>>
>>>Adjunct Associate Professor, Computer Science Department
>>
>>>
>>
>>>University of Southern California, Los Angeles, CA 90089 USA
>>
>>>
>>
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>>
>>
>>
>>
>

Re: How to update cTAKES so that new top level categories come out based on local dictionary?

Reply via email to