John- I believe that was the thinking. Andy- Just to confirm- Is the raw content of this dataset released under ASL2.0? i.e. can you contribute it as a CSV or similar so that cTAKES may re-tokenize it using the same PTB rules, format it for cTAKES' dictionary lookup, etc., and then redistribute it under the same License.
> -----Original Message----- > From: John Green [mailto:[email protected]] > Sent: Thursday, November 13, 2014 1:55 PM > To: [email protected] > Cc: [email protected] > Subject: Re: Announcement: UMLS MedGen-MySQL dataset now available > as open access download > > The old licensed setup would be kept as a packaged option? Much as it is > now.... With the unlicensed going out in place of the current "free" > dictionary? Am I understanding that right? > > > JG > — > Sent from Mailbox > > On Thu, Nov 13, 2014 at 1:40 PM, andy mcmurry > <[email protected]> > wrote: > > > I'll crunch the numbers -- in the meantime I can tell you that > > phenotypes vary by semantic type. clinical attributes from SNOMED are > > abundant, many concepts in mesh that are mapped to diseases. Tons of > > "pharmacological substances" > > On Nov 12, 2014 6:19 AM, "Dligach, Dmitriy" < > > [email protected]> wrote: > >> Andy, thank you for this resource! > >> > >> Do you have an estimate of what percentage of UMLS concepts were left > out? > >> > >> Dima > >> > >> > >> > >> > >> On Nov 11, 2014, at 16:02, andy mcmurry <[email protected]> > wrote: > >> > >> > Hello! > >> > > >> > https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2) > >> > > >> > We just released a new library containing a huge chunk of UMLS > >> > concepts which are available without registering > accounts/username/passwords. > >> > LEGALLY. Yes, really! > >> > > >> > The subset is from NCBI and it contains *thousands of concepts from > >> SNOMED > >> > and other vocabularies*. > >> > > >> > The code is essentially > >> > 1. a list of WGET targets to various NCBI FTP site mirrors 2. > >> > Makefile for building the databases of interest > >> > > >> > Our legal team has approved distribution for Open Access work, ASL2 > >> > LICENSE. > >> > > >> > I recommend we use this opportunity to make this the default > >> > distribution for CTAKES UMLS connections, because it obviates the > >> > need for so much painful credentialing and back and forth > >> > agreements with the US National Library of Medicine. > >> > > >> > Cheers! > >> > --Andy > >> > > >> > > >> > On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. < > >> [email protected]> > >> > wrote: > >> > > >> >> > >> >> I would love to see the install be as simple as apt-get install to > >> >> end > >> up > >> >> with some working dictionary that have more than a handful of > >> >> entries to get them started. > >> >> > >> >> Regards, > >> >> James Masanz > >> >> > >> >> -----Original Message----- > >> >> From: andy mcmurry [mailto:[email protected]] > >> >> Sent: Tuesday, September 09, 2014 4:32 PM > >> >> To: [email protected] > >> >> Subject: Recommendation for ctakes default (UMLS) dictionaries > >> >> > >> >> Greetings ctakes-dev: > >> >> > >> >> *UMLS license restrictions have been getting more lax over the > >> >> years -- *much of the UMLS can be downloaded directly from the > >> >> NCBI official FTP site. > >> >> > >> >> In fact, the NIH (and implicitly the NLM) *have already made the > >> standard > >> >> terms public for some medical specialities*. > >> >> > >> >> For example: Here is the UMLS subset specific to Medical Genetics > >> (MedGen) > >> >> and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s) > >> >> and > >> names, > >> >> etc : > >> >> > >> >> [ ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html ] > >> >> > >> >> My team has developed a JVM based wrapper for MetaMap 2013AB > which > >> >> I intend to open source soon (Clojure). It includes REST support > >> >> for invoking MetaMap with any or all of the command line arguments. > >> >> We do not integrate with UIMA, we are basically a wrapper around > >> >> the binary installation of MetaMap. The emphasis is on publication > >> >> text not clinical text, still, some services are common (such as LVG). > >> >> > >> >> Strangely, the NLM still requires UMLS licenses to download > >> >> MetaMap execution binaries. The MetaMap binary install is better > >> >> but customizing dictionaries (DataFileBuilder) is not as easy to > >> >> use as CTAKES with > >> YTEXT > >> >> > >> >> [ > >> >> https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installati > >> >> on > >> ] > >> >> > >> >> *** Hence, there is a real opportunity here to enable Apache > >> >> cTAKES to have a stronger default dictionary. ** * > >> >> > >> >> Imagine if we could > >> >> *$ apt-get install apache-ctakes * > >> >> > >> >> and instantly have a working package for SOME problem domain. > >> >> In my case (Medical Genetics) the UMLS definitions are already > >> >> available and the UMLS license problem becomes a non issue, at > >> >> least for many > >> first > >> >> time users > >> >> > >> >> Your thoughts? > >> >> AndyMC > >> >> > >> > >>
