RE: NAACCR metadata, latest go-round

Lenon Patrick Tue, 18 Nov 2014 06:22:26 -0800

Great, thanks, especially for the bit about the shortcuts.  That would have 
been tough to suss out on my own.

From: Dan Connolly [mailto:[email protected]]
Sent: Monday, November 17, 2014 5:43 PM
To: Lenon Patrick; '[email protected]'
Subject: RE: NAACCR metadata, latest go-round

1) I remember the data being a little messy... the code only loads the sections 
that I was confident had no PHI (aside from the MRN, which gets deidentified 
later). I eyeballed all the data in one of the views. What a pain. But I 
haven't found any way around it.

As to how the code works... I suppose you're reading it correctly. I haven't 
looked at it in quite a while. If you feel like contributing a design sketch to 
add to the top of naaccr_concepts_load.sql, I can see that it gets code 
reviewed and integrated. (see for example the "Medication dispense facts" 
comment in epic_meds_transform.sql). "Patches welcome," as they say.

2) My recollection is that getting the WHO files is pretty painless (at least 
compared with, say, UMLS); anything we would set up to reduce duplicated effort 
would be at least as much hassle... especially since we'd have to set it up.

3) The code to build the ontology is designed to be run each time the data is 
loaded. (No, I don't recall discussing this in gpc-dev).

4) If you got naaccr_shortcuts.csv from version control on elephant, you can 
follow your nose thru "renamed shortcut concepts and reworked staging concept 
hierarchy (#2112<https://bmi-work.kumc.edu/work/ticket/2112>)" to KUMC ticket 
#2112<https://informatics.kumc.edu/work/ticket/2112> to the milford 
release<https://informatics.kumc.edu/work/milestone/heron-milford-update> to... 
darn; there should be a link to the relevant blog article:

  *   HERON Milford simplifies the Cancer Cases 
folder<https://informatics.kumc.edu/work/blog/heron-milford-update>

A new folder, Cancer Cases (Abridged), debuts with the Milford release. This 
folder contains the frequently searched concepts from the Cancer Cases folder. 
Advanced searchers can still search the entire tumor registry in the unabridged 
folder.
Oops; that should be in the list of blog articles on 
TumorRegistry<https://informatics.kumc.edu/work/wiki/TumorRegistry>.

--
Dan
________________________________
From: 
[email protected]<mailto:[email protected]> 
[[email protected]] on behalf of Lenon Patrick 
[[email protected]]
Sent: Monday, November 17, 2014 4:32 PM
To: '[email protected]'
Subject: NAACCR metadata, latest go-round
Hi all, hoping some of you are in a good state of mind to share your 
experience, brilliance, shattering good looks... enough flattery?

Anyway, my first attempt at building a NAACCR ontology was disappointing in 
that the NAACCR tables were not as helpful as I expected.  Fundamental problem: 
A lot of junk in the column where the code value normally resides.  So in that 
column I have found (besides codes):

*         Code ranges (e.g. "1-100, 110-12")

*         References to outside sources (WHO in particular)

*         The word "BLANK", presumably to indicate the field is optional?

*         ..

*         *

*         Comments, sometimes with HTML markups

Looking at the heron naaccr_txform and naaccr_concepts_load scripts, the 
ultimate NAACCR ontology consists of all unique base/concept codes found in the 
imported NAACCR data file (now in table tumor_reg_codes) merged with some 
fields from either the NAACCR code table (naaccr.t_code) or one of the external 
tables (e.g., WHO.TOPO).

So, questions arise:

1)      First, is the above description reasonably accurate?  Please point out 
any glaring errors.  I did leave out some detail like special cases.

2)      I looked at WHO's site, and began applying for access to their tables 
as listed in heron\heron_staging\tumor_reg\icd_o_meta.py .  However, before I 
continue, will every site have to do this individually?  Tom Mish is of the 
opinion YES.  But, does anyone have a legal/kosher/ethical way for us to not 
duplicate this effort?

3)      If I'm correct that only codes that appear in the Registry data will be 
loaded into the ontology, well, is that OK?  This fixes the problems with code 
ranges and non-code values, and the ontology tree is effectively pre-trimmed.  
But what are the implications for future loads of new data?  Has this 
discussion already happened?

4)      Bonus question:  What the heck is naaccr_shortcuts.csv?  It looks very 
useful, I have no idea what for.

Thanks in advance for any input all of y'all provide.

Patrick Lenon
HIMC Informatics Specialist
608 890 5671

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

RE: NAACCR metadata, latest go-round

Reply via email to