Hi Dr. Chrischilles, In identifying data elements important for characterizing the breast cancer cohort, it's not essential that your group identifies specific code sets, initially. We're happy to start with whatever terms you typically use in your discussion with your peers, patients, etc. If you happen to know relevant standard coding systems, that's great; but if not, I'm sure we can work together to identify them.
Meanwhile... The closest thing I have handy to a list of the NAACCR items that the KUMC is currently extracting from NAACCR is probably in our heron_load/curated_data/naaccr_shortcuts.csv<https://informatics.kumc.edu/work/browser/heron_load/curated_data/naaccr_shortcuts.csv> file: PARENT KEY NAME KIND KEY_TYPE Abridged Cancer Case Identifiers (Abridged) fabricated Abridged Demographics Cancer Demographics (Selected) fabricated Demographics 0160 Race tree name Demographics 0190 Spanish/Hispanic Origin tree name Demographics 0220 Sex tree name Demographics 0240 Date of Birth tree name Abridged 0521 Morph tree name Abridged 0490 Diagnostic Confirmation tree name Abridged 0523 Behavior Code ICD-O-3 tree name Abridged 0440 Grade tree name Abridged 0390 Date of Diagnosis tree name Abridged SEER Site Summary tree name Abridged 0630 Primary Payer at DX tree name Abridged 3000 Derived AJCC-6 Stage Grp tree name Abridged 3430 Derived AJCC-7 Stage Grp tree name Abridged 0910 TNM Path Stage Group tree name Abridged 0970 TNM Clin Stage Group tree name Abridged 1750 Date of Last Contact tree name Abridged 1760 Vital Status tree name Abridged 1860 Recurrence Date--1st tree name Abridged 1880 Recurrence Type--1st tree name Abridged 1910 Cause of Death tree name Abridged 0610 Class of Case 0610 Class of Case (Selected) node name 0610 Class of Case Analytic Analytic fabricated 0610 Class of Case Non-Analytic Non-Analytic fabricated Analytic NAACCR|610:00 tree basecode Analytic NAACCR|610:10 tree basecode Analytic NAACCR|610:11 tree basecode Analytic NAACCR|610:12 tree basecode Analytic NAACCR|610:13 tree basecode Analytic NAACCR|610:14 tree basecode Analytic NAACCR|610:20 tree basecode Analytic NAACCR|610:21 tree basecode Analytic NAACCR|610:22 tree basecode Non-Analytic NAACCR|610:30 tree basecode Non-Analytic NAACCR|610:31 tree basecode Non-Analytic NAACCR|610:32 tree basecode Non-Analytic NAACCR|610:33 tree basecode Non-Analytic NAACCR|610:34 tree basecode Non-Analytic NAACCR|610:36 tree basecode Non-Analytic NAACCR|610:38 tree basecode Non-Analytic NAACCR|610:40 tree basecode Non-Analytic NAACCR|610:99 tree basecode That's the result of a discussion with cancer researchers at KUMC. (It looks nicer when presented through the HERON/i2b2 web user interface. I should probably make a screenshot, but I don't have VPN access set up just now.) The cancer-specific data elements seem to be: * 0521 Morph (morphology) * 0490 Diagnostic Confirmation (I'm not sure what that is) * 0523 Behavior Code ICD-O-3 * 0440 Grade * SEER Site Summary * 3000 Derived AJCC-6 Stage Grp * ... stage according to a few other sources * 1880 Recurrence Type--1st * 0610 Class of Case The demographics and vital status is somewhat redundant with other demographic data in HERON; further harmonization is on our todo list, but so far, our approach has been to err on the side of making data available and letting investigators weed out the redundancies. Thanks for letting me include the public gpc-dev forum in this discussion. In that context, a few further technical details... The original KUMC/HERON TumorRegistry<https://informatics.kumc.edu/work/wiki/TumorRegistry> approach did not involve curating specific NAACCR items. Rather, we just use a generic algorithm that grabs everything in the NAACCR file and weeds out stuff that might be PHI. It results in a term hierarchy that parallels the text of the NAACCR format spec: * Thornton M, (ed). DATA STANDARDS AND DATA DICTIONARY Standards for Cancer Registries Volume II: Data Standards and Data Dictionary<http://www.naaccr.org/LinkClick.aspx?fileticket=LJJNRVo4lT4%3d&tabid=133&mid=473>, Record Layout Version 12.1, 15th ed. Springfield, Ill.: North American Association of Central Cancer Registries, June 2010. The result is pretty overwhelming, from a usability perspective. Hence the list of "shortcuts" above. FWIW, the code for the generic algorithm is in naaccr_txform.sql<https://informatics.kumc.edu/work/browser/heron_load/naaccr_txform.sql>; it excludes whole sections of the spec (e.g. 8 -- Patient-Confidential) as well as data types that may contain PHI ('5-digit or 9-digit U.S. ZIP codes%', ''Text--%'', etc.) ... and ns.SectionID in ( 1 -- Cancer Identification , 2 -- Demographic -- , 3 -- Edit Overrides/Conversion History/System Admin , 4 -- Follow-up/Recurrence/Death -- , 5 -- Hospital-Confidential , 6 -- Hospital-Specific -- , 7 -- Other-Confidential -- , 8 -- Patient-Confidential -- , 9 -- Record ID -- , 10 -- Special Use 11 -- Stage/Prognostic Factors -- TODO: numeric stuff -- , 12 -- Text-Diagnosis -- , 13 -- Text-Miscellaneous -- , 14 -- Text-Treatment -- , 15 -- Treatment-1st Course , 16 -- Treatment-Subsequent & Other , 17 -- Pathology ) -- TODO: store these in the ID repository and de-id later and ni."AllowValue" not like 'City name or UNKNOWN' and ni."AllowValue" not like 'Reference to EDITS table BPLACE.DBF in Appendix B' and ni."AllowValue" not like '5-digit or 9-digit U.S. ZIP codes%' and ni."AllowValue" not like 'Census Tract Codes%' and ni."AllowValue" not like 'See Appendix A for standard FIPS county codes%' and ni."AllowValue" not like 'See Appendix A for county codes for each state.%' and ni."ItemName" not like 'Age at Diagnosis' and ni."ItemName" not like 'Text--%' and ni."ItemName" not like 'Place of Death' -- Dan ________________________________ From: Tamara McMahon Sent: Friday, January 24, 2014 3:52 PM To: Dan Connolly Subject: FW: NAACCR data Do we have a list I can point her to? From: Chrischilles, Elizabeth A [mailto:[email protected]] Sent: Friday, January 24, 2014 1:17 PM To: Tamara McMahon Subject: RE: NAACCR data Tamara, Is there a list of the NAACCR items that the KUMC is currently extracting from NAACCR? That would help. Betsy From: Tamara McMahon [mailto:[email protected]] Sent: Wednesday, January 22, 2014 3:54 PM To: Chrischilles, Elizabeth A Subject: GPC: NAACCR data I spoke with Dan who is heading the Development and Standards work for PCORI. Currently there is not timeline set for having each location online with NAACCR data. This is an agenda item for our KUMC PM meeting tomorrow. I’ll let you know the results and when we can expect NAACCR data available via data warehouses. Dan did mention that the breast cancer group will need to provide him a list of data needed for the project. GPC isn’t going to standardize all the NAACCR data across all sites in the initial 18 months but focus on the needed data elements for the study. So, the breast cancer group should define what data elements, both NAACCR and non-NAACCR, (e.g., diagnosis X, site, behavior, recurrence, specific medication types, certain procedures, etc.) are needed for the study. If there are any known codes, such as CPT, ICD-9, ICDO-3, that would help too. Thanks, Tamara McMahon Clinical Informatics Coordinator Division of Medical Informatics University of Kansas Medical Center 913-945-7470
