Hi all, hoping some of you are in a good state of mind to share your experience, brilliance, shattering good looks... enough flattery?
Anyway, my first attempt at building a NAACCR ontology was disappointing in that the NAACCR tables were not as helpful as I expected. Fundamental problem: A lot of junk in the column where the code value normally resides. So in that column I have found (besides codes): * Code ranges (e.g. "1-100, 110-12") * References to outside sources (WHO in particular) * The word "BLANK", presumably to indicate the field is optional? * .. * * * Comments, sometimes with HTML markups Looking at the heron naaccr_txform and naaccr_concepts_load scripts, the ultimate NAACCR ontology consists of all unique base/concept codes found in the imported NAACCR data file (now in table tumor_reg_codes) merged with some fields from either the NAACCR code table (naaccr.t_code) or one of the external tables (e.g., WHO.TOPO). So, questions arise: 1) First, is the above description reasonably accurate? Please point out any glaring errors. I did leave out some detail like special cases. 2) I looked at WHO's site, and began applying for access to their tables as listed in heron\heron_staging\tumor_reg\icd_o_meta.py . However, before I continue, will every site have to do this individually? Tom Mish is of the opinion YES. But, does anyone have a legal/kosher/ethical way for us to not duplicate this effort? 3) If I'm correct that only codes that appear in the Registry data will be loaded into the ontology, well, is that OK? This fixes the problems with code ranges and non-code values, and the ontology tree is effectively pre-trimmed. But what are the implications for future loads of new data? Has this discussion already happened? 4) Bonus question: What the heck is naaccr_shortcuts.csv? It looks very useful, I have no idea what for. Thanks in advance for any input all of y'all provide. Patrick Lenon HIMC Informatics Specialist 608 890 5671
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
