Option 1: 
hg clone https://bitbucket.org/invitae/medgen-mysql
cd medgen-mysql 
make user 
make medgen 

Then we point cTAKES to use DictionaryLookup using the MySQL database. 
Nicely indexed, customizable, linkable, etc. 

Option 2: 
hg clone https://bitbucket.org/invitae/medgen-mysql
cd medgen-mysql 
./mirror.sh medgen/urls 

This will fetch the dictionary files, replace MRCONSO with MGCONSO from 
medgen/mirror directory. 

Option 3: 
Directly bake this process right into the cTAKES installation. 
Interested in what you and others feel would be the fastest way to get new 
users online with cTAKES. 

Hope this helps, 
[email protected]

On Oct 2, 2015, at 7:02 AM, "Mattmann, Chris A (3980)" 
<[email protected]> wrote:

> Hi,
> 
> I would be extremely interested in a sample dictionary that
> doesn’t require a UMLS login.
> 
> How would I use this?
> 
> Thanks,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: [email protected]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> 
> -----Original Message-----
> From: "[email protected] (forwarding)" <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Friday, October 2, 2015 at 12:43 AM
> To: "[email protected]" <[email protected]>
> Subject: building a *real sample dictionary* without UMLS login
> 
>> Greetings ctakes-dev!
>> 
>> I have been polishing MedGen (UMLS) dictionaries for over a year now and
>> I am confident in saying "this is solid".
>> As a reminder, the medgen-mysql package contains a large subset of the
>> UMLS that can be downloaded without UMLS login, greatly simplifying the
>> creation of an example dictionary.
>> 
>> QUESTION: 
>> Would you like me to integrate this into ctakes to simplify installations
>> for new-users, and if so, what would be your preferred method?
>> 
>> Source Vocabularies (SAB)
>> +-------------+--------+
>> | SourceVocab | cnt    |
>> +-------------+--------+
>> | MSH         | 245435 | Medical Subject Headings
>> | SNOMEDCT_US | 156105 | SNOMED Clinical Terms
>> | NCI         | 136888 | NCI Cancer Terms
>> | ...         |  ...   |
>> +-------------+--------+
>> 
>> Semantic Types (STY)
>> +-------------------------------------------+--------+
>> | SemanticType                              | cnt    |
>> +-------------------------------------------+--------+
>> | Pharmacologic Substance                   | 102511 |
>> | Finding                                   |  90413 |
>> | Organic Chemical                          |  81329 |
>> | Disease or Syndrome                       |  47223 |
>> | Neoplastic Process                        |  16151 |
>> | Amino Acid, Peptide, or Protein           |   9383 |
>> | Congenital Abnormality                    |   6536 |
>> | Pathologic Function                       |   5655 |
>> | Steroid                                   |   3919 |
>> | Sign or Symptom                           |   2909 |
>> | ...                                       |   ...  |
>> 
>> 
>> What would you like to see?
>> [email protected]    
>> 
>> 
>> On Nov 12, 2014, at 6:14 AM, "Dligach, Dmitriy"
>> <[email protected]> wrote:
>> 
>>> Andy, thank you for this resource!
>>> 
>>> Do you have an estimate of what percentage of UMLS concepts were left
>>> out?
>>> 
>>> Dima
>>> 
>>> 
>>> 
>>> 
>>> On Nov 11, 2014, at 16:02, andy mcmurry <[email protected]> wrote:
>>> 
>>>> Hello!
>>>> 
>>>> https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
>>>> 
>>>> We just released a new library containing a huge chunk of UMLS concepts
>>>> which are available without registering accounts/username/passwords.
>>>> LEGALLY. Yes, really!
>>>> 
>>>> The subset is from NCBI and it contains *thousands of concepts from
>>>> SNOMED
>>>> and other vocabularies*.
>>>> 
>>>> The code is essentially
>>>> 1. a list of WGET targets to various NCBI FTP site mirrors
>>>> 2. Makefile for building the databases of interest
>>>> 
>>>> Our legal team has approved distribution for Open Access work, ASL2
>>>> LICENSE.
>>>> 
>>>> I recommend we use this opportunity to make this the default
>>>> distribution
>>>> for CTAKES UMLS connections, because it obviates the need for so much
>>>> painful credentialing and back and forth agreements with the US
>>>> National
>>>> Library of Medicine.
>>>> 
>>>> Cheers!
>>>> --Andy
>>>> 
>>>> 
>>>> On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J.
>>>> <[email protected]>
>>>> wrote:
>>>> 
>>>>> 
>>>>> I would love to see the install be as simple as apt-get install to
>>>>> end up
>>>>> with some working dictionary that have more than a handful of entries
>>>>> to
>>>>> get them started.
>>>>> 
>>>>> Regards,
>>>>> James Masanz
>>>>> 
>>>>> -----Original Message-----
>>>>> From: andy mcmurry [mailto:[email protected]]
>>>>> Sent: Tuesday, September 09, 2014 4:32 PM
>>>>> To: [email protected]
>>>>> Subject: Recommendation for ctakes default (UMLS) dictionaries
>>>>> 
>>>>> Greetings ctakes-dev:
>>>>> 
>>>>> *UMLS license restrictions have been getting more lax over the years
>>>>> --
>>>>> *much of the UMLS can be downloaded directly from the NCBI official
>>>>> FTP
>>>>> site.
>>>>> 
>>>>> In fact, the NIH (and implicitly the NLM) *have already made the
>>>>> standard
>>>>> terms public for some medical specialities*.
>>>>> 
>>>>> For example: Here is the UMLS subset specific to Medical Genetics
>>>>> (MedGen)
>>>>> and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s) and
>>>>> names,
>>>>> etc :
>>>>> 
>>>>> [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
>>>>> 
>>>>> My team has developed a JVM based wrapper for MetaMap 2013AB which I
>>>>> intend to open source soon (Clojure).  It includes REST support for
>>>>> invoking MetaMap with any or all of the command line arguments.
>>>>> We do not integrate with UIMA, we are basically a wrapper around the
>>>>> binary installation of MetaMap. The emphasis is on publication text
>>>>> not
>>>>> clinical text, still, some services are common (such as LVG).
>>>>> 
>>>>> Strangely, the NLM still requires UMLS licenses to download MetaMap
>>>>> execution binaries. The MetaMap binary install is better but
>>>>> customizing
>>>>> dictionaries (DataFileBuilder) is not as easy to use as CTAKES with
>>>>> YTEXT
>>>>> 
>>>>> [ 
>>>>> https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation ]
>>>>> 
>>>>> *** Hence, there is a real opportunity here to enable Apache cTAKES to
>>>>> have a stronger default dictionary. ** *
>>>>> 
>>>>> Imagine if we could
>>>>> *$ apt-get install apache-ctakes *
>>>>> 
>>>>> and instantly have a working package for SOME problem domain.
>>>>> In my case (Medical Genetics) the UMLS definitions are already
>>>>> available
>>>>> and the UMLS license problem becomes a non issue, at least for many
>>>>> first
>>>>> time users
>>>>> 
>>>>> Your thoughts?
>>>>> AndyMC
>>>>> 
>>> 
>> 
> 

Reply via email to