There's a lot of alphabet soup, here. In preparation for the Nov 15 call, I'd like to get the discussion started in email. (Note the gpc-dev public archive<http://listserv.kumc.edu/pipermail/gpc-dev/>).
I would prefer to work backward from a mocked up spreadsheet. My questions of 19 Sep<https://informatics.gpcnetwork.org/trac/Project/ticket/140#comment:53> were: * Does the desired form of the data have one row per patient? * or per visit? * Is patient-day a good enough definition of visit? * what columns / observations / variables are expected for each row? * Nominal, Ordinal, Interval or Ratio? * codes for nominals? * units? Mei's response<https://informatics.gpcnetwork.org/trac/Project/ticket/140#comment:54>, after talking with Bernie Black and Abel Kho said organize as row-per-visit; yes, patient-day is close enough. She was reluctant to give specifics on columns, but she said the followings are categories of variables listed in the proposal: * Clinical Variables in EMR: . Demographics: gender, race . Treatment: standard diabetes medications . Response to treatment: HbA1c levels, systolic and diastolic blood pressure, HDL and LDL cholesterol, triglycerides . Medication adherence: pharmacy fill data or refill rates . Treatment adherence: weights, checks at least twice a year . Physician adherence: orders for HbA1c, urine microalbumin, pneumonia and flu vaccine, and documented annual foot and eye exams . Health outcomes: renal disease, peripheral artery disease/amputation, retinopathy, cardiovascular disease (coronary events and ischemic stroke) * Supplemental Demographic Variables in Geocoded Data: * Income, education, likelihood of employment, poverty status, owner-occupied house value, health insurance coverage, etc. It would help if there were a shared copy of the proposal that we can all refer to, by the way. I just put what I know in a next-D mock-up<https://docs.google.com/spreadsheets/d/12h3fwK_AZYPCU28XVfu8n45bn6DUQ4qwY9zvgFWozow/edit#gid=1012432412> in google sheets. Feel free to comment and suggest changes. It includes details such as that we would use 05 to represent race=White and 03 for Black, (following the PCORnet data model). The first sheet has mocked up data and the 2nd sheet is a REDCap data dictionary. If we are to collect "Treatment: standard diabetes medications" then we need a similar level of detail. OMOP seems to have very mature methods for handling drug exposures, but we don't have much experience with that. In a recent data collection for breast cancer, we used a REDCap drop-down list of relevant RXNorm codes drawn from the GPC terminology. This is where i2b2 and babel<https://babel.gpcnetwork.org/> come in. With a babel account, you can browse and get details on the terminology as well as a rough sense of what data is available from each GPC site. (It's possible to assemble and save a query that can be actually run at all sites, though that's a bit labor-intensive at this point.) For HbA1c, there may be an issue of which LOINC code to use, but I expect we can set that aside since we had to address it for the PCORnet CDM LAB_RESULT_CDM table. But there may be multiple such results in a single visit. In one recent study, I used the median to aggregate them. Would that approach be appropriate here? And so on for the other clinical EMR data. For income, I have been working with UHD001 Median household income in the past 12 months (in 2013 inflation-adjusted dollars) from ACS. The ACS has 4000+ variables including 15 "median household income" variables (see ticket:140#comment:17<https://informatics.gpcnetwork.org/trac/Project/ticket/140#comment:17>). Which of those 4000+ variables would you like to use for education, employment, poverty, house value, health insurance coverage, etc? -- Dan
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
