Hi all, hoping some of you are in a good state of mind to share your 
experience, brilliance, shattering good looks... enough flattery?

Anyway, my first attempt at building a NAACCR ontology was disappointing in 
that the NAACCR tables were not as helpful as I expected.  Fundamental problem: 
A lot of junk in the column where the code value normally resides.  So in that 
column I have found (besides codes):

*         Code ranges (e.g. "1-100, 110-12")

*         References to outside sources (WHO in particular)

*         The word "BLANK", presumably to indicate the field is optional?

*         ..

*         *

*         Comments, sometimes with HTML markups


Looking at the heron naaccr_txform and naaccr_concepts_load scripts, the 
ultimate NAACCR ontology consists of all unique base/concept codes found in the 
imported NAACCR data file (now in table tumor_reg_codes) merged with some 
fields from either the NAACCR code table (naaccr.t_code) or one of the external 
tables (e.g., WHO.TOPO).

So, questions arise:

1)      First, is the above description reasonably accurate?  Please point out 
any glaring errors.  I did leave out some detail like special cases.

2)      I looked at WHO's site, and began applying for access to their tables 
as listed in heron\heron_staging\tumor_reg\icd_o_meta.py .  However, before I 
continue, will every site have to do this individually?  Tom Mish is of the 
opinion YES.  But, does anyone have a legal/kosher/ethical way for us to not 
duplicate this effort?

3)      If I'm correct that only codes that appear in the Registry data will be 
loaded into the ontology, well, is that OK?  This fixes the problems with code 
ranges and non-code values, and the ontology tree is effectively pre-trimmed.  
But what are the implications for future loads of new data?  Has this 
discussion already happened?

4)      Bonus question:  What the heck is naaccr_shortcuts.csv?  It looks very 
useful, I have no idea what for.


Thanks in advance for any input all of y'all provide.


Patrick Lenon
HIMC Informatics Specialist
608 890 5671

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to