In the "People with at least one ordered medications specific to Diabetes
Mellitus" query, I see:
UPPER(a.RAW_RX_MED_NAME) like UPPER('%Acetohexamide%') or
UPPER(a.RAW_RX_MED_NAME) like UPPER('%D[i,y]melor%') or
I don't think we populate RAW_RX_MED_NAME. We being KUMC. Not sure about other
I notice a long list of rxnorm_cuis in the word
e.g. 1156200. Are those in the .sql? Oh. yes. they are. Never mind.
From: Dan Connolly
Sent: Monday, November 07, 2016 2:43 PM
To: Al'ona Furmanchuk; Satyender Goel
Cc: <email@example.com>; Abel Kho
Subject: data collection for next-D: i2b2, babel, OMOP, PCORnet CDM, ACS,
There's a lot of alphabet soup, here. In preparation for the Nov 15 call, I'd
like to get the discussion started in email. (Note the gpc-dev public
I would prefer to work backward from a mocked up spreadsheet. My questions of
* Does the desired form of the data have one row per patient?
* or per visit?
* Is patient-day a good enough definition of visit?
* what columns / observations / variables are expected for each row?
* Nominal, Ordinal, Interval or Ratio?
* codes for nominals?
after talking with Bernie Black and Abel Kho said organize as row-per-visit;
yes, patient-day is close enough. She was reluctant to give specifics on
columns, but she said the followings are categories of variables listed in the
* Clinical Variables in EMR:
. Demographics: gender, race
. Treatment: standard diabetes medications
. Response to treatment: HbA1c levels, systolic and diastolic blood pressure,
HDL and LDL cholesterol, triglycerides
. Medication adherence: pharmacy fill data or refill rates
. Treatment adherence: weights, checks at least twice a year
. Physician adherence: orders for HbA1c, urine microalbumin, pneumonia and flu
vaccine, and documented annual foot and eye exams
. Health outcomes: renal disease, peripheral artery disease/amputation,
retinopathy, cardiovascular disease (coronary events and ischemic stroke)
* Supplemental Demographic Variables in Geocoded Data:
* Income, education, likelihood of employment, poverty status,
owner-occupied house value, health insurance coverage, etc.
It would help if there were a shared copy of the proposal that we can all refer
to, by the way.
I just put what I know in a next-D
in google sheets. Feel free to comment and suggest changes. It includes
details such as that we would use 05 to represent race=White and 03 for Black,
(following the PCORnet data model). The first sheet has mocked up data and the
2nd sheet is a REDCap data dictionary.
If we are to collect "Treatment: standard diabetes medications" then we need a
similar level of detail. OMOP seems to have very mature methods for handling
drug exposures, but we don't have much experience with that. In a recent data
collection for breast cancer, we used a REDCap drop-down list of relevant
RXNorm codes drawn from the GPC terminology. This is where i2b2 and
babel<https://babel.gpcnetwork.org/> come in. With a babel account, you can
browse and get details on the terminology as well as a rough sense of what data
is available from each GPC site. (It's possible to assemble and save a query
that can be actually run at all sites, though that's a bit labor-intensive at
For HbA1c, there may be an issue of which LOINC code to use, but I expect we
can set that aside since we had to address it for the PCORnet CDM
LAB_RESULT_CDM table. But there may be multiple such results in a single visit.
In one recent study, I used the median to aggregate them. Would that approach
be appropriate here?
And so on for the other clinical EMR data.
For income, I have been working with UHD001 Median household income in the past
12 months (in 2013 inflation-adjusted dollars) from ACS. The ACS has 4000+
variables including 15 "median household income" variables (see
Which of those 4000+ variables would you like to use for education,
employment, poverty, house value, health insurance coverage, etc?
Gpc-dev mailing list