We talked about 10s of thousands; now I read:

  1.  Denominator or study population

Minimum two encounters (inpatient, outpatient, and emergency ΜΆ see Part A1 in 
Appendix A) in EDW at any time on different days.

AND age between 18 and 89 years old at time of encounter. Age is defined as a 
difference between the date of birth and date of encounter

Am I reading this right? If so, that makes the denominator more like 100s of 
thousands per site... or millions. More or less everybody.

Well... not quite everybody...

The SQL code refers to CDM tables. We follow SCILHS in cutting off our CDM 
population at Jan 1 2010. Is that as expected?

And we'll be collecting Clinical Variables in the EHR and Geocoded Data from 
the whole denominator population?

ref NEXT-D Data Request 
Detail<https://informatics.gpcnetwork.org/trac/Project/attachment/ticket/539/NEXT-D_Request%20for%20Data_Detailed_12.1.16.docx>


--
Dan

________________________________
From: Dan Connolly
Sent: Monday, November 07, 2016 2:43 PM
To: Al'ona Furmanchuk; Satyender Goel
Cc: <gpc-dev@listserv.kumc.edu>; Abel Kho
Subject: data collection for next-D: i2b2, babel, OMOP, PCORnet CDM, ACS, 
geocoding

There's a lot of alphabet soup, here. In preparation for the Nov 15 call, I'd 
like to get the discussion started in email. (Note the gpc-dev public 
archive<http://listserv.kumc.edu/pipermail/gpc-dev/>).

I would prefer to work backward from a mocked up spreadsheet. My questions of 
19 Sep<https://informatics.gpcnetwork.org/trac/Project/ticket/140#comment:53> 
were:

  *   Does the desired form of the data have one row per patient?
     *   or per visit?
        *   Is patient-day a good enough definition of visit?
  *   what columns / observations / variables are expected for each row?
     *   Nominal, Ordinal, Interval or Ratio?
     *   codes for nominals?
     *   units?

Mei's 
response<https://informatics.gpcnetwork.org/trac/Project/ticket/140#comment:54>,
 after talking with Bernie Black and Abel Kho said organize as row-per-visit; 
yes, patient-day is close enough. She was reluctant to give specifics on 
columns, but she said the followings are categories of variables listed in the 
proposal:

  *   Clinical Variables in EMR:

. Demographics: gender, race
. Treatment: standard diabetes medications
. Response to treatment: HbA1c levels, systolic and diastolic blood pressure, 
HDL and LDL cholesterol, triglycerides
. Medication adherence: pharmacy fill data or refill rates
. Treatment adherence: weights, checks at least twice a year
. Physician adherence: orders for HbA1c, urine microalbumin, pneumonia and flu 
vaccine, and documented annual foot and eye exams
. Health outcomes: renal disease, peripheral artery disease/amputation, 
retinopathy, cardiovascular disease (coronary events and ischemic stroke)

  *   Supplemental Demographic Variables in Geocoded Data:
     *   Income, education, likelihood of employment, poverty status, 
owner-occupied house value, health insurance coverage, etc.

It would help if there were a shared copy of the proposal that we can all refer 
to, by the way.

I just put what I know in a next-D 
mock-up<https://docs.google.com/spreadsheets/d/12h3fwK_AZYPCU28XVfu8n45bn6DUQ4qwY9zvgFWozow/edit#gid=1012432412>
 in google sheets. Feel free to comment and suggest changes. It includes 
details such as that we would use 05 to represent race=White and 03 for Black, 
(following the PCORnet data model). The first sheet has mocked up data and the 
2nd sheet is a REDCap data dictionary.

If we are to collect "Treatment: standard diabetes medications" then we need a 
similar level of detail. OMOP seems to have very mature methods for handling 
drug exposures, but we don't have much experience with that. In a recent data 
collection for breast cancer, we used a REDCap drop-down list of relevant 
RXNorm codes drawn from the GPC terminology. This is where i2b2 and 
babel<https://babel.gpcnetwork.org/> come in. With a babel account, you can 
browse and get details on the terminology as well as a rough sense of what data 
is available from each GPC site. (It's possible to assemble and save a query 
that can be actually run at all sites, though that's a bit labor-intensive at 
this point.)

For HbA1c, there may be an issue of which LOINC code to use, but I expect we 
can set that aside since we had to address it for the PCORnet CDM  
LAB_RESULT_CDM table. But there may be multiple such results in a single visit. 
In one recent study, I used the median to aggregate them. Would that approach 
be appropriate here?

And so on for the other clinical EMR data.

For income, I have been working with UHD001 Median household income in the past 
12 months (in 2013 inflation-adjusted dollars) from ACS. The ACS has 4000+ 
variables including 15 "median household income" variables (see 
ticket:140#comment:17<https://informatics.gpcnetwork.org/trac/Project/ticket/140#comment:17>).
 Which of those 4000+ variables would you like to use for education, 
employment, poverty, house value, health insurance coverage, etc?

--
Dan

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to