In QA-ing our NAACCR data, we found some apparent duplicates in the NAACCR 
ontology as produced by the heron code.  "Duplicates" being defined as two 
metadata records with the same c_fullname (not synonyms).

These appear to be caused by minor differences in spelling and punctuation in 
ICD_O_MORPH, like "tumor" vs. "tumour."

For example, this query:

SELECT *
  FROM I2B2_DEV_ETL..ICD_O_MORPH icdo
  where icdo.CONCEPT_NAME like '%8010%'
  order by concept_cd;

yields records with concept_name of '8010/0 Epithelial tumour, benign' and 
'8010/0 Epithelial tumor, benign' with the only difference being the English 
spelling of "tumour."

There are other minor differences like
'8010/6 Carcinoma, metastatic NOS'
'8010/6 Carcinoma, metastatic, NOS'          /* extra comma */

This in turn was caused by slight differences between MORPH2 and MORPH3, aka 
ICD-0-2 and ICD-0-3.

So what if anything did you folks do with this?  Essentially they're synonyms 
(if I understand I2B2 synonyms correctly).  Are they useful as such?  Or did 
you just wind up nuking the extras on more or less random criteria (like 
getting rid of all the "tumour" entries or some such)?




Patrick Lenon
HIMC Informatics Specialist
608 890 5671

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to