In QA-ing our NAACCR data, we found some apparent duplicates in the NAACCR ontology as produced by the heron code. "Duplicates" being defined as two metadata records with the same c_fullname (not synonyms).
These appear to be caused by minor differences in spelling and punctuation in ICD_O_MORPH, like "tumor" vs. "tumour." For example, this query: SELECT * FROM I2B2_DEV_ETL..ICD_O_MORPH icdo where icdo.CONCEPT_NAME like '%8010%' order by concept_cd; yields records with concept_name of '8010/0 Epithelial tumour, benign' and '8010/0 Epithelial tumor, benign' with the only difference being the English spelling of "tumour." There are other minor differences like '8010/6 Carcinoma, metastatic NOS' '8010/6 Carcinoma, metastatic, NOS' /* extra comma */ This in turn was caused by slight differences between MORPH2 and MORPH3, aka ICD-0-2 and ICD-0-3. So what if anything did you folks do with this? Essentially they're synonyms (if I understand I2B2 synonyms correctly). Are they useful as such? Or did you just wind up nuking the extras on more or less random criteria (like getting rid of all the "tumour" entries or some such)? Patrick Lenon HIMC Informatics Specialist 608 890 5671
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
