1) I remember the data being a little messy... the code only loads the sections that I was confident had no PHI (aside from the MRN, which gets deidentified later). I eyeballed all the data in one of the views. What a pain. But I haven't found any way around it.
As to how the code works... I suppose you're reading it correctly. I haven't looked at it in quite a while. If you feel like contributing a design sketch to add to the top of naaccr_concepts_load.sql, I can see that it gets code reviewed and integrated. (see for example the "Medication dispense facts" comment in epic_meds_transform.sql). "Patches welcome," as they say. 2) My recollection is that getting the WHO files is pretty painless (at least compared with, say, UMLS); anything we would set up to reduce duplicated effort would be at least as much hassle... especially since we'd have to set it up. 3) The code to build the ontology is designed to be run each time the data is loaded. (No, I don't recall discussing this in gpc-dev). 4) If you got naaccr_shortcuts.csv from version control on elephant, you can follow your nose thru "renamed shortcut concepts and reworked staging concept hierarchy (#2112<https://bmi-work.kumc.edu/work/ticket/2112>)" to KUMC ticket #2112<https://informatics.kumc.edu/work/ticket/2112> to the milford release<https://informatics.kumc.edu/work/milestone/heron-milford-update> to... darn; there should be a link to the relevant blog article: * HERON Milford simplifies the Cancer Cases folder<https://informatics.kumc.edu/work/blog/heron-milford-update> A new folder, Cancer Cases (Abridged), debuts with the Milford release. This folder contains the frequently searched concepts from the Cancer Cases folder. Advanced searchers can still search the entire tumor registry in the unabridged folder. Oops; that should be in the list of blog articles on TumorRegistry<https://informatics.kumc.edu/work/wiki/TumorRegistry>. -- Dan ________________________________ From: [email protected] [[email protected]] on behalf of Lenon Patrick [[email protected]] Sent: Monday, November 17, 2014 4:32 PM To: '[email protected]' Subject: NAACCR metadata, latest go-round Hi all, hoping some of you are in a good state of mind to share your experience, brilliance, shattering good looks… enough flattery? Anyway, my first attempt at building a NAACCR ontology was disappointing in that the NAACCR tables were not as helpful as I expected. Fundamental problem: A lot of junk in the column where the code value normally resides. So in that column I have found (besides codes): · Code ranges (e.g. “1-100, 110-12”) · References to outside sources (WHO in particular) · The word “BLANK”, presumably to indicate the field is optional? · .. · * · Comments, sometimes with HTML markups Looking at the heron naaccr_txform and naaccr_concepts_load scripts, the ultimate NAACCR ontology consists of all unique base/concept codes found in the imported NAACCR data file (now in table tumor_reg_codes) merged with some fields from either the NAACCR code table (naaccr.t_code) or one of the external tables (e.g., WHO.TOPO). So, questions arise: 1) First, is the above description reasonably accurate? Please point out any glaring errors. I did leave out some detail like special cases. 2) I looked at WHO’s site, and began applying for access to their tables as listed in heron\heron_staging\tumor_reg\icd_o_meta.py . However, before I continue, will every site have to do this individually? Tom Mish is of the opinion YES. But, does anyone have a legal/kosher/ethical way for us to not duplicate this effort? 3) If I’m correct that only codes that appear in the Registry data will be loaded into the ontology, well, is that OK? This fixes the problems with code ranges and non-code values, and the ontology tree is effectively pre-trimmed. But what are the implications for future loads of new data? Has this discussion already happened? 4) Bonus question: What the heck is naaccr_shortcuts.csv? It looks very useful, I have no idea what for. Thanks in advance for any input all of y’all provide. Patrick Lenon HIMC Informatics Specialist 608 890 5671
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
