Hi Dr. Chrischilles,

In identifying data elements important for characterizing the breast cancer 
cohort, it's not essential that your group identifies specific code sets, 
initially. We're happy to start with whatever terms you typically use in your 
discussion with your peers, patients, etc. If you happen to know relevant 
standard coding systems, that's great; but if not, I'm sure we can work 
together to identify them.

Meanwhile...

The closest thing I have handy to a list of the NAACCR items that the KUMC is 
currently extracting from NAACCR is probably in our 
heron_load/curated_data/naaccr_shortcuts.csv<https://informatics.kumc.edu/work/browser/heron_load/curated_data/naaccr_shortcuts.csv>
 file:

PARENT  KEY     NAME    KIND    KEY_TYPE

        Abridged        Cancer Case Identifiers (Abridged)      fabricated

Abridged        Demographics    Cancer Demographics (Selected)  fabricated

Demographics    0160 Race       tree    name

Demographics    0190 Spanish/Hispanic Origin    tree    name

Demographics    0220 Sex        tree    name

Demographics    0240 Date of Birth      tree    name

Abridged        0521 Morph      tree    name

Abridged        0490 Diagnostic Confirmation    tree    name

Abridged        0523 Behavior Code ICD-O-3      tree    name

Abridged        0440 Grade      tree    name

Abridged        0390 Date of Diagnosis  tree    name

Abridged        SEER Site Summary       tree    name

Abridged        0630 Primary Payer at DX        tree    name

Abridged        3000 Derived AJCC-6 Stage Grp   tree    name

Abridged        3430 Derived AJCC-7 Stage Grp   tree    name

Abridged        0910 TNM Path Stage Group       tree    name

Abridged        0970 TNM Clin Stage Group       tree    name

Abridged        1750 Date of Last Contact       tree    name

Abridged        1760 Vital Status       tree    name

Abridged        1860 Recurrence Date--1st       tree    name

Abridged        1880 Recurrence Type--1st       tree    name

Abridged        1910 Cause of Death     tree    name

Abridged        0610 Class of Case      0610 Class of Case (Selected)   node    
name

0610 Class of Case      Analytic        Analytic        fabricated

0610 Class of Case      Non-Analytic    Non-Analytic    fabricated

Analytic        NAACCR|610:00   tree    basecode
Analytic        NAACCR|610:10   tree    basecode
Analytic        NAACCR|610:11   tree    basecode
Analytic        NAACCR|610:12   tree    basecode
Analytic        NAACCR|610:13   tree    basecode
Analytic        NAACCR|610:14   tree    basecode
Analytic        NAACCR|610:20   tree    basecode
Analytic        NAACCR|610:21   tree    basecode
Analytic        NAACCR|610:22   tree    basecode
Non-Analytic    NAACCR|610:30   tree    basecode
Non-Analytic    NAACCR|610:31   tree    basecode
Non-Analytic    NAACCR|610:32   tree    basecode
Non-Analytic    NAACCR|610:33   tree    basecode
Non-Analytic    NAACCR|610:34   tree    basecode
Non-Analytic    NAACCR|610:36   tree    basecode
Non-Analytic    NAACCR|610:38   tree    basecode
Non-Analytic    NAACCR|610:40   tree    basecode
Non-Analytic    NAACCR|610:99   tree    basecode

That's the result of a discussion with cancer researchers at KUMC.

(It looks nicer when presented through the HERON/i2b2 web user interface. I 
should probably make a screenshot, but I don't have VPN access set up just now.)

The cancer-specific data elements seem to be:

  *   0521 Morph (morphology)
  *   0490 Diagnostic Confirmation (I'm not sure what that is)
  *   0523 Behavior Code ICD-O-3
  *   0440 Grade
  *   SEER Site Summary
  *   3000 Derived AJCC-6 Stage Grp
  *   ... stage according to a few other sources
  *   1880 Recurrence Type--1st
  *   0610 Class of Case

The demographics and vital status is somewhat redundant with other demographic 
data in HERON; further harmonization is on our todo list, but so far, our 
approach has been to err on the side of making data available and letting 
investigators weed out the redundancies.

Thanks for letting me include the public gpc-dev forum in this discussion. In 
that context, a few further technical details...


The original KUMC/HERON 
TumorRegistry<https://informatics.kumc.edu/work/wiki/TumorRegistry> approach 
did not involve curating specific NAACCR items. Rather, we just use a generic 
algorithm that grabs everything in the NAACCR file and weeds out stuff that 
might be PHI. It results in a term hierarchy that parallels the text of the 
NAACCR format spec:


  *   Thornton M, (ed).  DATA STANDARDS AND DATA DICTIONARY Standards for 
Cancer Registries Volume II: Data Standards and Data 
Dictionary<http://www.naaccr.org/LinkClick.aspx?fileticket=LJJNRVo4lT4%3d&tabid=133&mid=473>,
 Record Layout Version 12.1, 15th ed. Springfield, Ill.: North American 
Association of Central Cancer Registries, June 2010.

The result is pretty overwhelming, from a usability perspective. Hence the list 
of "shortcuts" above.

FWIW, the code for the generic algorithm is in 
naaccr_txform.sql<https://informatics.kumc.edu/work/browser/heron_load/naaccr_txform.sql>;
 it excludes whole sections of the spec (e.g. 8 -- Patient-Confidential) as 
well as data types that may contain PHI ('5-digit or 9-digit U.S. ZIP codes%', 
''Text--%'', etc.)

...
and ns.SectionID in (
  1 -- Cancer Identification
 , 2 -- Demographic
-- , 3 -- Edit Overrides/Conversion History/System Admin
 , 4 -- Follow-up/Recurrence/Death
-- , 5 -- Hospital-Confidential
 , 6 -- Hospital-Specific
-- , 7 -- Other-Confidential
-- , 8 -- Patient-Confidential
-- , 9 -- Record ID
-- , 10 -- Special Use
  11 -- Stage/Prognostic Factors -- TODO: numeric stuff
-- , 12 -- Text-Diagnosis
-- , 13 -- Text-Miscellaneous
-- , 14 -- Text-Treatment
-- , 15 -- Treatment-1st Course
, 16 -- Treatment-Subsequent & Other
, 17 -- Pathology
)
-- TODO: store these in the ID repository and de-id later
and ni."AllowValue" not like 'City name or UNKNOWN'
and ni."AllowValue" not like 'Reference to EDITS table BPLACE.DBF in Appendix B'
and ni."AllowValue" not like '5-digit or 9-digit U.S. ZIP codes%'
and ni."AllowValue" not like 'Census Tract Codes%'
and ni."AllowValue" not like 'See Appendix A for standard FIPS county codes%'
and ni."AllowValue" not like 'See Appendix A for county codes for each state.%'
and ni."ItemName" not like 'Age at Diagnosis'
and ni."ItemName" not like 'Text--%'
and ni."ItemName" not like 'Place of Death'

--
Dan


________________________________
From: Tamara McMahon
Sent: Friday, January 24, 2014 3:52 PM
To: Dan Connolly
Subject: FW: NAACCR data

Do we have a list I can point her to?

From: Chrischilles, Elizabeth A [mailto:[email protected]]
Sent: Friday, January 24, 2014 1:17 PM
To: Tamara McMahon
Subject: RE: NAACCR data

Tamara,
Is there a list of the NAACCR items that the KUMC is currently extracting from 
NAACCR?  That would help.
Betsy

From: Tamara McMahon [mailto:[email protected]]
Sent: Wednesday, January 22, 2014 3:54 PM
To: Chrischilles, Elizabeth A
Subject: GPC: NAACCR data

I spoke with Dan who is heading the Development and Standards work for PCORI.  
Currently there is not timeline set for having each location online with NAACCR 
data.  This is an agenda item for our KUMC PM meeting tomorrow.  I’ll let you 
know the results and when we can expect NAACCR data available via data 
warehouses.

Dan did mention that the breast cancer group will need to provide him a list of 
data needed for the project.  GPC isn’t going to standardize all the NAACCR 
data across all sites in the initial 18 months but focus on the needed data 
elements for the study.  So, the breast cancer group should define what data 
elements, both NAACCR and non-NAACCR, (e.g., diagnosis X, site, behavior, 
recurrence, specific medication types, certain procedures, etc.)  are needed 
for the study.  If there are any known codes, such as CPT, ICD-9, ICDO-3, that 
would help too.

Thanks,
Tamara McMahon
Clinical Informatics Coordinator
Division of Medical Informatics
University of Kansas Medical Center
913-945-7470

Reply via email to