Russ reminds me that Table 6.1 of the GPC proposal<http://frontiersresearch.org/frontiers/sites/default/files/frontiers/documents/GPC-PCORI-CDRN-Research-Plan-Template-KUMCv44.pdf> already has a pretty good sketch of breast cancer data elements and such:
Proposed Data Elements Risk factor information: age, race/ethnicity, sex (1% are men), family history, genetic markers (BRCA1/BRCA2), Oncotype or Mammoprint recurrence risk, age at first full-term pregnancy, breast feeding history, age at first menses, menopausal status, use of hormone replacement therapy, alcohol use, tobacco use, body mass index, breast density, prior history of breast cancer, prior radiation treatment, prior diagnosis of breast tissue hyperplasia. Baseline information: diagnostic tests, tumor stage, size, number of positive lymph nodes, grade, histology, hormone, HER2 and EGFR receptor status, performance status, comorbidity, body mass index. Initial treatment: chemotherapy, hormonal therapy, molecular targeted therapy, surgical procedures. During treatment: laboratory (WBC), psychosocial characteristics including pain, quality of life, positive meaning and vulnerability, adverse effects including lymphedema, shoulder function, pain, depression, nausea. Post-treatment/survivorship: cancer surveillance imaging, medications including aromatase inhibitors and hormone modulators, psychosocial measures, adverse effects, late effects such as cardiotoxicity, ACoS/IOM survivorship process measures including having a care plan, treatment summary, measure of distress, and lifestyle characteristics such as body mass index, exercise, nutrition, alcohol, and smoking. Target Population Size We expect 4,500 newly diagnosed cases among GPC member sites annually; The number of prevalent survivors currently is estimated to be 54,000 based on US prevalence statistics. Proposed Membership Sources Tumor registries are a highly evolved, high quality, complete source of tumor characteristics that are not always integrated within the EHR. A major thrust of GPC work will be to integrate these data streams. Cancer therapeutics including oral and injectable chemotherapies, radiation, and surgery, are well-represented in the EHR as are many of the risk factors, diagnostic tests and comorbidity. Critical prognostic data (i.e. risk factors or baseline status) that are known to be documented incompletely or in unstandardized ways include performance status and out-of-system tests for receptor status and recurrence risk. Programming prospective collection of these data using clinical decision- making tools available in the EHR (e.g. Epic Best Practice alerts) will be an incredibly important and valuable activity for the GPC. Incorporating recording of standardized (e.g. NIH PROMIS system) patient-reported measures such as pain, distress, and quality of life into the clinical work flow is another major need that we plan to address ________________________________ From: Dan Connolly Sent: Sunday, January 26, 2014 9:34 PM To: Tamara McMahon; Chrischilles, Elizabeth A Cc: [email protected] Subject: RE: NAACCR data (and breast cancer data elements) Hi Dr. Chrischilles, In identifying data elements important for characterizing the breast cancer cohort, it's not essential that your group identifies specific code sets, initially. We're happy to start with whatever terms you typically use in your discussion with your peers, patients, etc. If you happen to know relevant standard coding systems, that's great; but if not, I'm sure we can work together to identify them. Meanwhile... The closest thing I have handy to a list of the NAACCR items that the KUMC is currently extracting from NAACCR is probably in our heron_load/curated_data/naaccr_shortcuts.csv<https://informatics.kumc.edu/work/browser/heron_load/curated_data/naaccr_shortcuts.csv> file: PARENT KEY NAME KIND KEY_TYPE Abridged Cancer Case Identifiers (Abridged) fabricated Abridged Demographics Cancer Demographics (Selected) fabricated Demographics 0160 Race tree name Demographics 0190 Spanish/Hispanic Origin tree name Demographics 0220 Sex tree name Demographics 0240 Date of Birth tree name Abridged 0521 Morph tree name Abridged 0490 Diagnostic Confirmation tree name Abridged 0523 Behavior Code ICD-O-3 tree name Abridged 0440 Grade tree name Abridged 0390 Date of Diagnosis tree name Abridged SEER Site Summary tree name Abridged 0630 Primary Payer at DX tree name Abridged 3000 Derived AJCC-6 Stage Grp tree name Abridged 3430 Derived AJCC-7 Stage Grp tree name Abridged 0910 TNM Path Stage Group tree name Abridged 0970 TNM Clin Stage Group tree name Abridged 1750 Date of Last Contact tree name Abridged 1760 Vital Status tree name Abridged 1860 Recurrence Date--1st tree name Abridged 1880 Recurrence Type--1st tree name Abridged 1910 Cause of Death tree name Abridged 0610 Class of Case 0610 Class of Case (Selected) node name 0610 Class of Case Analytic Analytic fabricated 0610 Class of Case Non-Analytic Non-Analytic fabricated Analytic NAACCR|610:00 tree basecode Analytic NAACCR|610:10 tree basecode Analytic NAACCR|610:11 tree basecode Analytic NAACCR|610:12 tree basecode Analytic NAACCR|610:13 tree basecode Analytic NAACCR|610:14 tree basecode Analytic NAACCR|610:20 tree basecode Analytic NAACCR|610:21 tree basecode Analytic NAACCR|610:22 tree basecode Non-Analytic NAACCR|610:30 tree basecode Non-Analytic NAACCR|610:31 tree basecode Non-Analytic NAACCR|610:32 tree basecode Non-Analytic NAACCR|610:33 tree basecode Non-Analytic NAACCR|610:34 tree basecode Non-Analytic NAACCR|610:36 tree basecode Non-Analytic NAACCR|610:38 tree basecode Non-Analytic NAACCR|610:40 tree basecode Non-Analytic NAACCR|610:99 tree basecode That's the result of a discussion with cancer researchers at KUMC. (It looks nicer when presented through the HERON/i2b2 web user interface. I should probably make a screenshot, but I don't have VPN access set up just now.) The cancer-specific data elements seem to be: * 0521 Morph (morphology) * 0490 Diagnostic Confirmation (I'm not sure what that is) * 0523 Behavior Code ICD-O-3 * 0440 Grade * SEER Site Summary * 3000 Derived AJCC-6 Stage Grp * ... stage according to a few other sources * 1880 Recurrence Type--1st * 0610 Class of Case The demographics and vital status is somewhat redundant with other demographic data in HERON; further harmonization is on our todo list, but so far, our approach has been to err on the side of making data available and letting investigators weed out the redundancies. Thanks for letting me include the public gpc-dev forum in this discussion. In that context, a few further technical details... The original KUMC/HERON TumorRegistry<https://informatics.kumc.edu/work/wiki/TumorRegistry> approach did not involve curating specific NAACCR items. Rather, we just use a generic algorithm that grabs everything in the NAACCR file and weeds out stuff that might be PHI. It results in a term hierarchy that parallels the text of the NAACCR format spec: * Thornton M, (ed). DATA STANDARDS AND DATA DICTIONARY Standards for Cancer Registries Volume II: Data Standards and Data Dictionary<http://www.naaccr.org/LinkClick.aspx?fileticket=LJJNRVo4lT4%3d&tabid=133&mid=473>, Record Layout Version 12.1, 15th ed. Springfield, Ill.: North American Association of Central Cancer Registries, June 2010. The result is pretty overwhelming, from a usability perspective. Hence the list of "shortcuts" above. FWIW, the code for the generic algorithm is in naaccr_txform.sql<https://informatics.kumc.edu/work/browser/heron_load/naaccr_txform.sql>; it excludes whole sections of the spec (e.g. 8 -- Patient-Confidential) as well as data types that may contain PHI ('5-digit or 9-digit U.S. ZIP codes%', ''Text--%'', etc.) ... and ns.SectionID in ( 1 -- Cancer Identification , 2 -- Demographic -- , 3 -- Edit Overrides/Conversion History/System Admin , 4 -- Follow-up/Recurrence/Death -- , 5 -- Hospital-Confidential , 6 -- Hospital-Specific -- , 7 -- Other-Confidential -- , 8 -- Patient-Confidential -- , 9 -- Record ID -- , 10 -- Special Use 11 -- Stage/Prognostic Factors -- TODO: numeric stuff -- , 12 -- Text-Diagnosis -- , 13 -- Text-Miscellaneous -- , 14 -- Text-Treatment -- , 15 -- Treatment-1st Course , 16 -- Treatment-Subsequent & Other , 17 -- Pathology ) -- TODO: store these in the ID repository and de-id later and ni."AllowValue" not like 'City name or UNKNOWN' and ni."AllowValue" not like 'Reference to EDITS table BPLACE.DBF in Appendix B' and ni."AllowValue" not like '5-digit or 9-digit U.S. ZIP codes%' and ni."AllowValue" not like 'Census Tract Codes%' and ni."AllowValue" not like 'See Appendix A for standard FIPS county codes%' and ni."AllowValue" not like 'See Appendix A for county codes for each state.%' and ni."ItemName" not like 'Age at Diagnosis' and ni."ItemName" not like 'Text--%' and ni."ItemName" not like 'Place of Death' -- Dan ________________________________ From: Tamara McMahon Sent: Friday, January 24, 2014 3:52 PM To: Dan Connolly Subject: FW: NAACCR data Do we have a list I can point her to? From: Chrischilles, Elizabeth A [mailto:[email protected]] Sent: Friday, January 24, 2014 1:17 PM To: Tamara McMahon Subject: RE: NAACCR data Tamara, Is there a list of the NAACCR items that the KUMC is currently extracting from NAACCR? That would help. Betsy From: Tamara McMahon [mailto:[email protected]] Sent: Wednesday, January 22, 2014 3:54 PM To: Chrischilles, Elizabeth A Subject: GPC: NAACCR data I spoke with Dan who is heading the Development and Standards work for PCORI. Currently there is not timeline set for having each location online with NAACCR data. This is an agenda item for our KUMC PM meeting tomorrow. I’ll let you know the results and when we can expect NAACCR data available via data warehouses. Dan did mention that the breast cancer group will need to provide him a list of data needed for the project. GPC isn’t going to standardize all the NAACCR data across all sites in the initial 18 months but focus on the needed data elements for the study. So, the breast cancer group should define what data elements, both NAACCR and non-NAACCR, (e.g., diagnosis X, site, behavior, recurrence, specific medication types, certain procedures, etc.) are needed for the study. If there are any known codes, such as CPT, ICD-9, ICDO-3, that would help too. Thanks, Tamara McMahon Clinical Informatics Coordinator Division of Medical Informatics University of Kansas Medical Center 913-945-7470
