(a) I'm not sure about morphology in particular; I don't recall any stray 
values there
(b) re goofy data in general, we typically put some effort into getting it 
corrected upstream, but meanwhile, the HERON code is littered with "if it's 
less than, say, 0.1%, sweep it under the rug" checks.

See also:

  *   #88 HERON ETL SQL data check thresholds are not portable and not always 
applicable <https://informatics.gpcnetwork.org/trac/Project/ticket/88>

which should tell more of this story than it currently does...

--
Dan

________________________________
From: [email protected] [[email protected]] on 
behalf of Lenon Patrick [[email protected]]
Sent: Tuesday, March 10, 2015 9:00 AM
To: [email protected]
Subject: NAACCR 0521 Morphology - extra values

On checking fact table concepts vs. concept dimension for UW NAACCR, we have 
~200 fact records (out of 60,000) whose concepts don’t match up.  In other 
words, their 521-Morphology code value (actually 522 – Histology plus 
523-Behavior) doesn’t match any value in the ICDO_O_MORPH table set up in the 
naaccr_concepts_load.sql script.  Examples are 98153, 80412, 80722.  The most 
likely explanation is data errors of some kind.

So first, have others experienced this sort of thing with field 521?  Are there 
other possible explanations?

Second, assuming this data is defective, is there a protocol or principle to 
follow with such records?  i.e., “gently correct them” vs. “nuke them on sight”?


Patrick Lenon
HIMC Informatics Specialist
608 890 5671

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to