The obesity Redcap project will have a Survey and data form for study
tracking:
* Tracker (contains invitation mail-out dates, non-automated response
status info, etc.)
o data can/will be uploaded in batch (Excel spreadsheet), e.g. to
assign a mail out date to wave X participants
o records can also edited in Redcap by study coordinator
* Survey
o contains survey responses
o data entered by study participant or coordinator (if respondent
uses phone/snail mail)
I know that earlier we reported problems tracking respondents who have
not yet filled out a survey. This turned out to be due to having
auto-numbering turned on. If you run into this problem at your site and
want more details, we can send those in a separate email. Moving right
along...
The tracker will be the first form in the project, and it will be batch
uploaded from Excel/csv, with the study assigned ID in the first column,
so this becomes the record linking field. Then we will use the Redcap
participant list, to generate unique survey URLs to be printed on
snail-mailed invitations. This will be done by sending dummy invitation
emails in bulk to an email address we control, which will also be a
field in the tracker form. The list of participants along with their
unique URLs can then be exported, for printing the snail-mail survey
invitations (presumably via some mail-merge feature in the researcher's
word processor).
We would also like to do the following, but have not tested anything yet:
* Export return codes for unique survey URLs (to help users who lost
theirs)
o Been done before per UMN Redcap FAQ, but requires plugin
specific to UMN
* Generate QR codes to be printed along with survey URLs
o Been done before per U Iowa survey tricks
When exporting data from Oracle into the tracker form, we use patient
information (BMI percentile, age, sex) to assign the study number, which
includes an encoded age-BMI-bin, and extra digits or characters that
will guarantee that no two respondents at different sites will have the
same ID.
/We use the emergency contact information to determine who/where to mail
the survey. Multiple emergency contacts exist for patients. For example,
UTHSCSA patients may have up to five separate sets of emergency contact
information in Epic. For choosing a contact, if there is no guardian,
the tentative plan is to fall back on mother, then father, then
emergency contact 1, and finally emergency contact 2. We are working on
ways to empirically decide which is the best order of precedence and
also awaiting guidance from some of our local experienced study
coordinators. If any of y'all have insights into this, please speak up.//
/
Update: it appears that the contact address info in the PATIENT table
itself is the most used, but still investigating where to pull the
salutation: PROXY_NAME or GUARDIAN_NAME (or some other field). Anybody
have any thoughts?
We also use contact information to reduce the number of duplicate
mailings. To that end, we are interested in any tools for formatting
addresses into USPS standard, comparing for duplicates, and validating
addresses. Our current plan is:
* Sort the entries in a random order.
* Randomly remove all but one records that come from the same
household defined as follows:
o Treat all case-normalized duplicate email addresses as
representing the same household.
o Remove all non-numeric characters and leading 1s from phone
numbers then treat all identical matches that result as
representing the same household.
o Concatenate ADDRESS1 and ADDRESS2 fields, replace all runs of
whitespace with a single whitespace character, convert
everything to lowercase, use certain USPS conversion rules
(converting to standard abbreviations except where this would
cause ambiguity), and then treat all identical matches that
result as representing the same household.
* After the high-confidence duplicates are removed, there will remain
clusters of similar addresses that may or may not be duplicates. We
will not auto-cull them, but we will flag them in a way that will
hopefully make them easier for a human to spot. The tentative plan
is to:
o Take each address (normalized for case, spaces, and
abbreviations in the previous step) and calculate the Levenstein
distance to each other such address.
o All addresses with a distance lower than our threshold will be
assigned the same randomly generated ID in the DUPLICATE_ID column.
o We then skip to the next address that doesn't yet have a
DUPLICATE_ID and repeat the process until we run out of addresses.
o Addresses that have no other addresses below the similarity
threshold will all be assigned a DUPLICATE_ID of 0.
* The normalized addresses will not be part of the final output to be
uploaded into REDCap, but the DUPLICATE_ID field will remain.
* In REDCap we will create a report that pulls only entries where
DUPLICATE_ID != 0 and sorts those entries by DUPLICATE_ID. A study
coordinator would then glance through these clusters and if they are
actually different addresses (e.g. adjacent houses, or apartments
within the same building, and probably more exotic variants), change
their DUPLICATE_ID to 0. The remaining ones would be deleted except
for the first entry.
This will not remove all duplicates, only diminish them. The primary
goal is not avoiding siblings. In fact, siblings living in separate
households will most likely slip through. This is just a limitation of
this study design we have to live with. The real reason removing
duplicates matters is minimizing how many households we irritate with
repeat mailings.
I2b2 does not contain these emergency contact fields. Therefore, even if
a site has an identified i2b2 instance, it will not be useful for
extracting contact information. We see no practical alternative at this
time to pulling these fields from the Epic source, then doing the study
IDs and duplicate detection within a python script. This script will
either output a CSV file ready to upload into REDCap to create the
tracker or directly create the tracker via the REDCap API. We will send
this script out to the study sites. Unless anybody has any better
suggestions?
_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev