Greetings all, Some of you may be aware of WordNet::Similarity's younger and more brawny cousin, UMLS::Similarity. This uses the Unified Medical Language System of the National Library of Medicine (NIH) to compute measures of similarity and relatedness between concepts found in ontologies and terminologies used in the medical domain (like MeSH, SNOMED-CT, etc.) I say it is brawnier as the UMLS offers about 2 million different concepts spread out over at least 100 different sources, so the volume of data is quite a bit more than WordNet. And it is focused on the medical domain, although it does include some concepts that cross over into general English.
You can find more information about the UMLS::Similarity package here : UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity (McInnes, Pedersen, and Pakhomov) - Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association, Nov 14-18, 2009, pp. 431-435, San Francisco, CA http://www.d.umn.edu/~tpederse/Pubs/amia09.pdf We now have a web interface running for UMLS::Similarity that is very similar (no pun intended?) to the WordNet::Similarity web interface that we've supported for quite a few years now. And of course you can always install UMLS::Similarity locally, assuming that you have already installed the UMLS. The web interface is running version 2011AA (the most current version of the UMLS), and supports a few popular sources. You can access the web interface at the site below in order to experiment just a bit before considering the installation of the entire package. http://atlas.ahc.umn.edu/cgi-bin/umls_similarity.cgi We also have a script called query-umls-similarity-webinterface.pl which automatically queries the web interface. This can be thought of as a network based version of the similarity.pl utility that you may use with WordNet::Similarity. The nice thing about this is you can automate quite a few queries and just let them run on our server, without having to install UMLS or even UMLS::Similarity on your own system. Below is a small cheat sheet that Bridget McInnes has put together on how to get started using this program, which you can find at : http://search.cpan.org/dist/UMLS-Similarity/utils/query-umls-similarity-webinterface.pl Note that you can locate the entire UMLS::Similarity package at the site below, although to run the automatic queries you only need the above program: http://search.cpan.org/dist/UMLS-Similarity =============== BASIC EXAMPLE ---------------------------------------------------------- The simplest case is: query-umls-similarity-webinterface.pl hand skull which returns the similarity between 'hand' 'skull' using the path measure where the path information is obtained from the PAR/CHD relations in MSH. MODIFY THE DEFAULT MEASURE ---------------------------------------------------------- There are a number of additional similarity or relatedness measures that you can use: Leacock & Chodorow (lch), Wu & Palmer (wup), Lin (lin) Resnik (res), Jiang & Conrath (jcn), Lesk (lesk) and the Vector Measure (vector). To change the measure, use the --measure option. For example: query-umls-similarity-webinterface.pl --measure lesk hand skull MODIFY THE DEFAULT SOURCE/RELATIONS ---------------------------------------------------------- There are also a number of additional SOURCE/RELATION options that you can use. For the similarity measures, you can use: Source Relations ----------------------- SNOMEDCT PAR/CHD SNOMEDCT RB/RN MSH PAR/CHD MSH RB/RN FMA PAR/CHD FMA RB/RN OMIM PAR/CHD OMIM RB/RN This means that the path information will be obtained from the specified source following the specified relations. For the relatedness measures, things are a little different because the relations refer to what relations the extended definition is derived from. In the interface, you can use: Source Relations ----------------------- SNOMEDCT CUI/PAR/CHD/RB/RN SNOMEDCT CUI MSH CUI/PAR/CHD/RB/RN MSH CUI UMLS_ALL CUI/PAR/CHD/RB/RN UMLS_ALL CUI To change the source and relations, use the --sab and --rel options. For example: query-umls-similarity-webinterface.pl --sab SNOMEDCT --rel PAR/CHD hand skull =========== Please let us know if you have any questions about the web interface, the automatic query program, and anything else about this package. Our goal with the web interface and automatic query program is to make this easy to use, so if there is anything we can do to further that end please let us know! There is a mailing list for UMLS::Similarity that you might want to join, particularly if you have questions about the package or the web interface. http://tech.groups.yahoo.com/group/umls-similarity/ Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse