Current survey management plan for Obesity

Alex Bokov Tue, 17 Feb 2015 09:14:39 -0800

The obesity Redcap project will have a Survey and data form for studytracking:


 * Tracker (contains invitation mail-out dates, non-automated response
   status info, etc.)
     o data can/will be uploaded in batch (Excel spreadsheet), e.g. to
       assign a mail out date to wave X participants
     o records can also edited in Redcap by study coordinator
 * Survey
     o contains survey responses
     o data entered by study participant or coordinator (if respondent
       uses phone/snail mail)

I know that earlier we reported problems tracking respondents who havenot yet filled out a survey. This turned out to be due to havingauto-numbering turned on. If you run into this problem at your site andwant more details, we can send those in a separate email. Moving rightalong...

The tracker will be the first form in the project, and it will be batchuploaded from Excel/csv, with the study assigned ID in the first column,so this becomes the record linking field. Then we will use the Redcapparticipant list, to generate unique survey URLs to be printed onsnail-mailed invitations. This will be done by sending dummy invitationemails in bulk to an email address we control, which will also be afield in the tracker form. The list of participants along with theirunique URLs can then be exported, for printing the snail-mail surveyinvitations (presumably via some mail-merge feature in the researcher'sword processor).


We would also like to do the following, but have not tested anything yet:

 * Export return codes for unique survey URLs (to help users who lost
   theirs)
     o Been done before per UMN Redcap FAQ, but requires plugin
       specific to UMN
 * Generate QR codes to be printed along with survey URLs
     o Been done before per U Iowa survey tricks

When exporting data from Oracle into the tracker form, we use patientinformation (BMI percentile, age, sex) to assign the study number, whichincludes an encoded age-BMI-bin, and extra digits or characters thatwill guarantee that no two respondents at different sites will have thesame ID.

/We use the emergency contact information to determine who/where to mailthe survey. Multiple emergency contacts exist for patients. For example,UTHSCSA patients may have up to five separate sets of emergency contactinformation in Epic. For choosing a contact, if there is no guardian,the tentative plan is to fall back on mother, then father, thenemergency contact 1, and finally emergency contact 2. We are working onways to empirically decide which is the best order of precedence andalso awaiting guidance from some of our local experienced studycoordinators. If any of y'all have insights into this, please speak up.//

Update: it appears that the contact address info in the PATIENT tableitself is the most used, but still investigating where to pull thesalutation: PROXY_NAME or GUARDIAN_NAME (or some other field). Anybodyhave any thoughts?

We also use contact information to reduce the number of duplicatemailings. To that end, we are interested in any tools for formattingaddresses into USPS standard, comparing for duplicates, and validatingaddresses. Our current plan is:


 * Sort the entries in a random order.
 * Randomly remove all but one records that come from the same
   household defined as follows:
     o Treat all case-normalized duplicate email addresses as
       representing the same household.
     o Remove all non-numeric characters and leading 1s from phone
       numbers then treat all identical matches that result as
       representing the same household.
     o Concatenate ADDRESS1 and ADDRESS2 fields, replace all runs of
       whitespace with a single whitespace character, convert
       everything to lowercase, use certain USPS conversion rules
       (converting to standard abbreviations except where this would
       cause ambiguity), and then treat all identical matches that
       result as representing the same household.
 * After the high-confidence duplicates are removed, there will remain
   clusters of similar addresses that may or may not be duplicates. We
   will not auto-cull them, but we will flag them in a way that will
   hopefully make them easier for a human to spot. The tentative plan
   is to:
     o Take each address (normalized for case, spaces, and
       abbreviations in the previous step) and calculate the Levenstein
       distance to each other such address.
     o All addresses with a distance lower than our threshold will be
       assigned the same randomly generated ID in the DUPLICATE_ID column.
     o We then skip to the next address that doesn't yet have a
       DUPLICATE_ID and repeat the process until we run out of addresses.
     o Addresses that have no other addresses below the similarity
       threshold will all be assigned a DUPLICATE_ID of 0.
 * The normalized addresses will not be part of the final output to be
   uploaded into REDCap, but the DUPLICATE_ID field will remain.
 * In REDCap we will create a report that pulls only entries where
   DUPLICATE_ID != 0 and sorts those entries by DUPLICATE_ID. A study
   coordinator would then glance through these clusters and if they are
   actually different addresses (e.g. adjacent houses, or apartments
   within the same building, and probably more exotic variants), change
   their DUPLICATE_ID to 0. The remaining ones would be deleted except
   for the first entry.

This will not remove all duplicates, only diminish them. The primarygoal is not avoiding siblings. In fact, siblings living in separatehouseholds will most likely slip through. This is just a limitation ofthis study design we have to live with. The real reason removingduplicates matters is minimizing how many households we irritate withrepeat mailings.

I2b2 does not contain these emergency contact fields. Therefore, even ifa site has an identified i2b2 instance, it will not be useful forextracting contact information. We see no practical alternative at thistime to pulling these fields from the Epic source, then doing the studyIDs and duplicate detection within a python script. This script willeither output a CSV file ready to upload into REDCap to create thetracker or directly create the tracker via the REDCap API. We will sendthis script out to the study sites. Unless anybody has any bettersuggestions?

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Current survey management plan for Obesity

Reply via email to