Russ reminds me that Table 6.1 of the GPC 
proposal<http://frontiersresearch.org/frontiers/sites/default/files/frontiers/documents/GPC-PCORI-CDRN-Research-Plan-Template-KUMCv44.pdf>
 already has a pretty good sketch of breast cancer data elements and such:

Proposed Data Elements
Risk factor information: age,
race/ethnicity, sex (1% are men),
family history, genetic markers
(BRCA1/BRCA2), Oncotype or
Mammoprint recurrence risk, age
at first full-term pregnancy,
breast feeding history, age at first
menses, menopausal status, use

of hormone replacement therapy,
alcohol use, tobacco use, body
mass index, breast density, prior
history of breast cancer, prior
radiation treatment, prior
diagnosis of breast tissue
hyperplasia.
Baseline information:
diagnostic tests, tumor stage,
size, number of positive lymph
nodes, grade, histology,
hormone, HER2 and EGFR
receptor status, performance
status, comorbidity, body mass
index.
Initial treatment:
chemotherapy, hormonal
therapy, molecular targeted
therapy, surgical procedures.
During treatment: laboratory
(WBC), psychosocial
characteristics including pain,
quality of life, positive meaning
and vulnerability, adverse effects
including lymphedema, shoulder
function, pain, depression,
nausea.
Post-treatment/survivorship:
cancer surveillance imaging,
medications including aromatase
inhibitors and hormone
modulators, psychosocial
measures, adverse effects, late
effects such as cardiotoxicity,
ACoS/IOM survivorship process
measures including having a care
plan, treatment summary,
measure of distress, and lifestyle
characteristics such as body mass
index, exercise, nutrition,
alcohol, and smoking.

Target Population
Size

We expect 4,500
newly diagnosed cases
among GPC member
sites annually; The
number of prevalent
survivors currently is
estimated to be 54,000
based on US
prevalence statistics.

Proposed Membership Sources
Tumor registries are a highly evolved,
high quality, complete source of tumor
characteristics that are not always
integrated within the EHR. A major
thrust of GPC work will be to integrate
these data streams. Cancer therapeutics
including oral and injectable
chemotherapies, radiation, and surgery,
are well-represented in the EHR as are
many of the risk factors, diagnostic tests
and comorbidity. Critical prognostic data
(i.e. risk factors or baseline status) that are
known to be documented incompletely or
in unstandardized ways include
performance status and out-of-system
tests for receptor status and recurrence
risk. Programming prospective collection
of these data using clinical decision-
making tools available in the EHR (e.g.
Epic Best Practice alerts) will be an
incredibly important and valuable activity
for the GPC. Incorporating recording of
standardized (e.g. NIH PROMIS system)
patient-reported measures such as pain,
distress, and quality of life into the
clinical work flow is another major need
that we plan to address


________________________________
From: Dan Connolly
Sent: Sunday, January 26, 2014 9:34 PM
To: Tamara McMahon; Chrischilles, Elizabeth A
Cc: [email protected]
Subject: RE: NAACCR data (and breast cancer data elements)

Hi Dr. Chrischilles,

In identifying data elements important for characterizing the breast cancer 
cohort, it's not essential that your group identifies specific code sets, 
initially. We're happy to start with whatever terms you typically use in your 
discussion with your peers, patients, etc. If you happen to know relevant 
standard coding systems, that's great; but if not, I'm sure we can work 
together to identify them.

Meanwhile...

The closest thing I have handy to a list of the NAACCR items that the KUMC is 
currently extracting from NAACCR is probably in our 
heron_load/curated_data/naaccr_shortcuts.csv<https://informatics.kumc.edu/work/browser/heron_load/curated_data/naaccr_shortcuts.csv>
 file:

PARENT  KEY     NAME    KIND    KEY_TYPE

        Abridged        Cancer Case Identifiers (Abridged)      fabricated

Abridged        Demographics    Cancer Demographics (Selected)  fabricated

Demographics    0160 Race       tree    name

Demographics    0190 Spanish/Hispanic Origin    tree    name

Demographics    0220 Sex        tree    name

Demographics    0240 Date of Birth      tree    name

Abridged        0521 Morph      tree    name

Abridged        0490 Diagnostic Confirmation    tree    name

Abridged        0523 Behavior Code ICD-O-3      tree    name

Abridged        0440 Grade      tree    name

Abridged        0390 Date of Diagnosis  tree    name

Abridged        SEER Site Summary       tree    name

Abridged        0630 Primary Payer at DX        tree    name

Abridged        3000 Derived AJCC-6 Stage Grp   tree    name

Abridged        3430 Derived AJCC-7 Stage Grp   tree    name

Abridged        0910 TNM Path Stage Group       tree    name

Abridged        0970 TNM Clin Stage Group       tree    name

Abridged        1750 Date of Last Contact       tree    name

Abridged        1760 Vital Status       tree    name

Abridged        1860 Recurrence Date--1st       tree    name

Abridged        1880 Recurrence Type--1st       tree    name

Abridged        1910 Cause of Death     tree    name

Abridged        0610 Class of Case      0610 Class of Case (Selected)   node    
name

0610 Class of Case      Analytic        Analytic        fabricated

0610 Class of Case      Non-Analytic    Non-Analytic    fabricated

Analytic        NAACCR|610:00   tree    basecode
Analytic        NAACCR|610:10   tree    basecode
Analytic        NAACCR|610:11   tree    basecode
Analytic        NAACCR|610:12   tree    basecode
Analytic        NAACCR|610:13   tree    basecode
Analytic        NAACCR|610:14   tree    basecode
Analytic        NAACCR|610:20   tree    basecode
Analytic        NAACCR|610:21   tree    basecode
Analytic        NAACCR|610:22   tree    basecode
Non-Analytic    NAACCR|610:30   tree    basecode
Non-Analytic    NAACCR|610:31   tree    basecode
Non-Analytic    NAACCR|610:32   tree    basecode
Non-Analytic    NAACCR|610:33   tree    basecode
Non-Analytic    NAACCR|610:34   tree    basecode
Non-Analytic    NAACCR|610:36   tree    basecode
Non-Analytic    NAACCR|610:38   tree    basecode
Non-Analytic    NAACCR|610:40   tree    basecode
Non-Analytic    NAACCR|610:99   tree    basecode

That's the result of a discussion with cancer researchers at KUMC.

(It looks nicer when presented through the HERON/i2b2 web user interface. I 
should probably make a screenshot, but I don't have VPN access set up just now.)

The cancer-specific data elements seem to be:

  *   0521 Morph (morphology)
  *   0490 Diagnostic Confirmation (I'm not sure what that is)
  *   0523 Behavior Code ICD-O-3
  *   0440 Grade
  *   SEER Site Summary
  *   3000 Derived AJCC-6 Stage Grp
  *   ... stage according to a few other sources
  *   1880 Recurrence Type--1st
  *   0610 Class of Case

The demographics and vital status is somewhat redundant with other demographic 
data in HERON; further harmonization is on our todo list, but so far, our 
approach has been to err on the side of making data available and letting 
investigators weed out the redundancies.

Thanks for letting me include the public gpc-dev forum in this discussion. In 
that context, a few further technical details...


The original KUMC/HERON 
TumorRegistry<https://informatics.kumc.edu/work/wiki/TumorRegistry> approach 
did not involve curating specific NAACCR items. Rather, we just use a generic 
algorithm that grabs everything in the NAACCR file and weeds out stuff that 
might be PHI. It results in a term hierarchy that parallels the text of the 
NAACCR format spec:


  *   Thornton M, (ed).  DATA STANDARDS AND DATA DICTIONARY Standards for 
Cancer Registries Volume II: Data Standards and Data 
Dictionary<http://www.naaccr.org/LinkClick.aspx?fileticket=LJJNRVo4lT4%3d&tabid=133&mid=473>,
 Record Layout Version 12.1, 15th ed. Springfield, Ill.: North American 
Association of Central Cancer Registries, June 2010.

The result is pretty overwhelming, from a usability perspective. Hence the list 
of "shortcuts" above.

FWIW, the code for the generic algorithm is in 
naaccr_txform.sql<https://informatics.kumc.edu/work/browser/heron_load/naaccr_txform.sql>;
 it excludes whole sections of the spec (e.g. 8 -- Patient-Confidential) as 
well as data types that may contain PHI ('5-digit or 9-digit U.S. ZIP codes%', 
''Text--%'', etc.)

...
and ns.SectionID in (
  1 -- Cancer Identification
 , 2 -- Demographic
-- , 3 -- Edit Overrides/Conversion History/System Admin
 , 4 -- Follow-up/Recurrence/Death
-- , 5 -- Hospital-Confidential
 , 6 -- Hospital-Specific
-- , 7 -- Other-Confidential
-- , 8 -- Patient-Confidential
-- , 9 -- Record ID
-- , 10 -- Special Use
  11 -- Stage/Prognostic Factors -- TODO: numeric stuff
-- , 12 -- Text-Diagnosis
-- , 13 -- Text-Miscellaneous
-- , 14 -- Text-Treatment
-- , 15 -- Treatment-1st Course
, 16 -- Treatment-Subsequent & Other
, 17 -- Pathology
)
-- TODO: store these in the ID repository and de-id later
and ni."AllowValue" not like 'City name or UNKNOWN'
and ni."AllowValue" not like 'Reference to EDITS table BPLACE.DBF in Appendix B'
and ni."AllowValue" not like '5-digit or 9-digit U.S. ZIP codes%'
and ni."AllowValue" not like 'Census Tract Codes%'
and ni."AllowValue" not like 'See Appendix A for standard FIPS county codes%'
and ni."AllowValue" not like 'See Appendix A for county codes for each state.%'
and ni."ItemName" not like 'Age at Diagnosis'
and ni."ItemName" not like 'Text--%'
and ni."ItemName" not like 'Place of Death'

--
Dan


________________________________
From: Tamara McMahon
Sent: Friday, January 24, 2014 3:52 PM
To: Dan Connolly
Subject: FW: NAACCR data

Do we have a list I can point her to?

From: Chrischilles, Elizabeth A [mailto:[email protected]]
Sent: Friday, January 24, 2014 1:17 PM
To: Tamara McMahon
Subject: RE: NAACCR data

Tamara,
Is there a list of the NAACCR items that the KUMC is currently extracting from 
NAACCR?  That would help.
Betsy

From: Tamara McMahon [mailto:[email protected]]
Sent: Wednesday, January 22, 2014 3:54 PM
To: Chrischilles, Elizabeth A
Subject: GPC: NAACCR data

I spoke with Dan who is heading the Development and Standards work for PCORI.  
Currently there is not timeline set for having each location online with NAACCR 
data.  This is an agenda item for our KUMC PM meeting tomorrow.  I’ll let you 
know the results and when we can expect NAACCR data available via data 
warehouses.

Dan did mention that the breast cancer group will need to provide him a list of 
data needed for the project.  GPC isn’t going to standardize all the NAACCR 
data across all sites in the initial 18 months but focus on the needed data 
elements for the study.  So, the breast cancer group should define what data 
elements, both NAACCR and non-NAACCR, (e.g., diagnosis X, site, behavior, 
recurrence, specific medication types, certain procedures, etc.)  are needed 
for the study.  If there are any known codes, such as CPT, ICD-9, ICDO-3, that 
would help too.

Thanks,
Tamara McMahon
Clinical Informatics Coordinator
Division of Medical Informatics
University of Kansas Medical Center
913-945-7470

Reply via email to