You could possibly use norm <http://lexsrv2.nlm.nih.gov/LexSysGroup/Projects/lvg/2015/docs/userDoc/tools/norm.html> to normalize the entity text strings. I can't vouch for its accuracy at this point, though.
Jen Seale Presidential Research Fellow, CUNY Graduate Center 512.705.4030 On Mon, Mar 2, 2015 at 11:29 AM, Dligach, Dmitriy < [email protected]> wrote: > Hello, > > Is anybody aware of a reliable way of identifying the head word of a UMLS > entity? In the general domain, people often use Collins rules, but I'm not > sure whether they would be applicable to clinical entities. > > Until recently I was under impression that taking the last word of an > entity would work pretty well, but now that I have looked at the data more > closely, I am not so sure. E.g. it fails in these cases: "breast, left", > "ductal carcinoma in situ", "carcinoma, consistent with breast primary". > > Dima > > > Dmitriy (Dima) Dligach, Ph.D. > Boston Children's Hospital and Harvard Medical School > (617) 651-0397 > > > >
