The few things I noticed which can be implementation risks: - The other survey plans (ALS and breast cancer) seem technically simpler. For example, with ALS Survey (which is closest to Obesity Survey in terms of the implementation) might not have any scripts to run in its plan. Each honest broker will get a list of patients that they will then share with the study coordinator who will manually go through the list and make sure the list does not have duplicates or deceased patients (and so on). Then they might place the data into a REDCap data import template and upload it via REDCap data import tool. But I understand that the Obesity survey cohort will be very large compared to ALS and might need something automated.
- We might have to accommodate for the differences in REDCap versions among the sites. For example, we were trying to work with folks at one of the GPC sites on a non-GPC project when we realized they use REDCap LTS (long term support) version which might not have the same features as a regular REDCap. Thanks. Regards, Adagarla, Bhargav Srinivas ________________________________ From: Dan Connolly Sent: Tuesday, March 03, 2015 9:17 AM To: Teresa Bosler; Supreet Kathpalia Cc: Bhargav Adagarla; [email protected]; Alex Bokov Subject: RE: Current survey management plan for Obesity Phillip/ Teresa / UTSW, Supreet / UMN, what do you think of this plan? Recall you agreed to review in our 17 Feb call. Review by others is, of course, more than welcome as well. -- Dan ________________________________ From: [email protected] [[email protected]] on behalf of Alex Bokov [[email protected]] Sent: Tuesday, February 17, 2015 11:11 AM To: [email protected] Subject: Current survey management plan for Obesity The obesity Redcap project will have a Survey and data form for study tracking: * Tracker (contains invitation mail-out dates, non-automated response status info, etc.) * data can/will be uploaded in batch (Excel spreadsheet), e.g. to assign a mail out date to wave X participants * records can also edited in Redcap by study coordinator * Survey * contains survey responses * data entered by study participant or coordinator (if respondent uses phone/snail mail) I know that earlier we reported problems tracking respondents who have not yet filled out a survey. This turned out to be due to having auto-numbering turned on. If you run into this problem at your site and want more details, we can send those in a separate email. Moving right along... The tracker will be the first form in the project, and it will be batch uploaded from Excel/csv, with the study assigned ID in the first column, so this becomes the record linking field. Then we will use the Redcap participant list, to generate unique survey URLs to be printed on snail-mailed invitations. This will be done by sending dummy invitation emails in bulk to an email address we control, which will also be a field in the tracker form. The list of participants along with their unique URLs can then be exported, for printing the snail-mail survey invitations (presumably via some mail-merge feature in the researcher's word processor). We would also like to do the following, but have not tested anything yet: * Export return codes for unique survey URLs (to help users who lost theirs) * Been done before per UMN Redcap FAQ, but requires plugin specific to UMN * Generate QR codes to be printed along with survey URLs * Been done before per U Iowa survey tricks When exporting data from Oracle into the tracker form, we use patient information (BMI percentile, age, sex) to assign the study number, which includes an encoded age-BMI-bin, and extra digits or characters that will guarantee that no two respondents at different sites will have the same ID. We use the emergency contact information to determine who/where to mail the survey. Multiple emergency contacts exist for patients. For example, UTHSCSA patients may have up to five separate sets of emergency contact information in Epic. For choosing a contact, if there is no guardian, the tentative plan is to fall back on mother, then father, then emergency contact 1, and finally emergency contact 2. We are working on ways to empirically decide which is the best order of precedence and also awaiting guidance from some of our local experienced study coordinators. If any of y'all have insights into this, please speak up. Update: it appears that the contact address info in the PATIENT table itself is the most used, but still investigating where to pull the salutation: PROXY_NAME or GUARDIAN_NAME (or some other field). Anybody have any thoughts? We also use contact information to reduce the number of duplicate mailings. To that end, we are interested in any tools for formatting addresses into USPS standard, comparing for duplicates, and validating addresses. Our current plan is: * Sort the entries in a random order. * Randomly remove all but one records that come from the same household defined as follows: * Treat all case-normalized duplicate email addresses as representing the same household. * Remove all non-numeric characters and leading 1s from phone numbers then treat all identical matches that result as representing the same household. * Concatenate ADDRESS1 and ADDRESS2 fields, replace all runs of whitespace with a single whitespace character, convert everything to lowercase, use certain USPS conversion rules (converting to standard abbreviations except where this would cause ambiguity), and then treat all identical matches that result as representing the same household. * After the high-confidence duplicates are removed, there will remain clusters of similar addresses that may or may not be duplicates. We will not auto-cull them, but we will flag them in a way that will hopefully make them easier for a human to spot. The tentative plan is to: * Take each address (normalized for case, spaces, and abbreviations in the previous step) and calculate the Levenstein distance to each other such address. * All addresses with a distance lower than our threshold will be assigned the same randomly generated ID in the DUPLICATE_ID column. * We then skip to the next address that doesn't yet have a DUPLICATE_ID and repeat the process until we run out of addresses. * Addresses that have no other addresses below the similarity threshold will all be assigned a DUPLICATE_ID of 0. * The normalized addresses will not be part of the final output to be uploaded into REDCap, but the DUPLICATE_ID field will remain. * In REDCap we will create a report that pulls only entries where DUPLICATE_ID != 0 and sorts those entries by DUPLICATE_ID. A study coordinator would then glance through these clusters and if they are actually different addresses (e.g. adjacent houses, or apartments within the same building, and probably more exotic variants), change their DUPLICATE_ID to 0. The remaining ones would be deleted except for the first entry. This will not remove all duplicates, only diminish them. The primary goal is not avoiding siblings. In fact, siblings living in separate households will most likely slip through. This is just a limitation of this study design we have to live with. The real reason removing duplicates matters is minimizing how many households we irritate with repeat mailings. I2b2 does not contain these emergency contact fields. Therefore, even if a site has an identified i2b2 instance, it will not be useful for extracting contact information. We see no practical alternative at this time to pulling these fields from the Epic source, then doing the study IDs and duplicate detection within a python script. This script will either output a CSV file ready to upload into REDCap to create the tracker or directly create the tracker via the REDCap API. We will send this script out to the study sites. Unless anybody has any better suggestions?
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
