You say that like it's a bad thing. ;-) We didn't do anything about this situation; never noticed it, and I don't see how it's a problem.
If the ICDO gods say there are two ways to spell tumour, who are we to say otherwise? You might try reporting the situation to them and see what they say. Multiple terms with the same c_fullname does turn into a problem when we populate the concept_dimension from the metadata table. In that case, we pick arbitrarily (using min). See concepts_activate.sql around line 70<https://informatics.kumc.edu/work/browser/heron_load/concepts_activate.sql#L70>. This is not specific to NAACCR. We ran into it somewhere else, I think, though I don't recall where. p.s. I happen to not have ready access to our production database so I used this query on babel to see the situation: SELECT * FROM i2b2metadata.heron_terms ht where ht.c_basecode like '%80100%' and ht.c_fullname like '\\i2b2\\naaccr\\%' order by ht.c_basecode limit 100 -- Dan ________________________________ From: [email protected] [[email protected]] on behalf of Lenon Patrick [[email protected]] Sent: Monday, May 04, 2015 2:41 PM To: [email protected] Cc: Yoshihara Deborah L Subject: Duplicates or synonyms in NAACCR ICD_O_MORPH? In QA-ing our NAACCR data, we found some apparent duplicates in the NAACCR ontology as produced by the heron code. “Duplicates” being defined as two metadata records with the same c_fullname (not synonyms). These appear to be caused by minor differences in spelling and punctuation in ICD_O_MORPH, like “tumor” vs. “tumour.” For example, this query: SELECT * FROM I2B2_DEV_ETL..ICD_O_MORPH icdo where icdo.CONCEPT_NAME like '%8010%' order by concept_cd; yields records with concept_name of '8010/0 Epithelial tumour, benign' and '8010/0 Epithelial tumor, benign' with the only difference being the English spelling of “tumour.” There are other minor differences like '8010/6 Carcinoma, metastatic NOS' '8010/6 Carcinoma, metastatic, NOS' /* extra comma */ This in turn was caused by slight differences between MORPH2 and MORPH3, aka ICD-0-2 and ICD-0-3. So what if anything did you folks do with this? Essentially they’re synonyms (if I understand I2B2 synonyms correctly). Are they useful as such? Or did you just wind up nuking the extras on more or less random criteria (like getting rid of all the “tumour” entries or some such)? Patrick Lenon HIMC Informatics Specialist 608 890 5671
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
