(a) I'm not sure about morphology in particular; I don't recall any stray values there (b) re goofy data in general, we typically put some effort into getting it corrected upstream, but meanwhile, the HERON code is littered with "if it's less than, say, 0.1%, sweep it under the rug" checks.
See also: * #88 HERON ETL SQL data check thresholds are not portable and not always applicable <https://informatics.gpcnetwork.org/trac/Project/ticket/88> which should tell more of this story than it currently does... -- Dan ________________________________ From: [email protected] [[email protected]] on behalf of Lenon Patrick [[email protected]] Sent: Tuesday, March 10, 2015 9:00 AM To: [email protected] Subject: NAACCR 0521 Morphology - extra values On checking fact table concepts vs. concept dimension for UW NAACCR, we have ~200 fact records (out of 60,000) whose concepts don’t match up. In other words, their 521-Morphology code value (actually 522 – Histology plus 523-Behavior) doesn’t match any value in the ICDO_O_MORPH table set up in the naaccr_concepts_load.sql script. Examples are 98153, 80412, 80722. The most likely explanation is data errors of some kind. So first, have others experienced this sort of thing with field 521? Are there other possible explanations? Second, assuming this data is defective, is there a protocol or principle to follow with such records? i.e., “gently correct them” vs. “nuke them on sight”? Patrick Lenon HIMC Informatics Specialist 608 890 5671
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
