For the purposes of the pediatric obesity survey, we wanted to identify
siblings to prevent sending multiple surveys to the same family, and having
some false positives is OK, especially for such a large cohort. The best way of
finding them seems to be matching by the parent/guardian phone numbers. We have
many siblings who, for various reasons, do not share a contact name or address.
Here’s the process with a little bit of background on how we get
parent/guardian contact info.
We use the following order of precedence for contact names, addresses, and
phone numbers, taking the first one where all address fields pass minimal
validation checks:
clarity.account (billing), clarity.emergency_contacts (mother > father >
guardian > emerg1 > emerg2).
If the contact full name matches the patient full name, we assume this in an
independent young adult, and may skip them accordingly. If the preferred
address has no name, we send letters to Mr. or Mrs. Patient_Last_Name. If none
of those preferred addresses are available, we use patient address and send
letter to Mr. or Mrs. Patient_Last_Name.
To find siblings, first we compare phone valid phone numbers for matches, then
zip codes, then we do a fuzzy match (using Levenshtein distance) on addresses
that are formatted for comparison as follows:
- ignore city and state (using zip code only)
- concatenate street line1+line2 and upper case it
- remove periods and commas
- abbreviate directional words per USPS rules
- abbreviate street suffixes, PO boxes, and rural route types per USPS rules
- remove commonly omitted street suffixes (AVE, DR, LN, RD, ST)
- apartment, #, lot, or unit designations are formatted similarly (e.g.
‘unit|123’)
- tokenize the string and list in the following order
- ZIP|BOX[|RT NUM][RT TYPE]
- e.g. PO Box 99 RR 2 Poteet, Tx 99999 -> ‘99999|99|RR|9’
- ZIP|STREET NAME|STREET NUM[|UNIT|123]
- e.g. ‘123 Main Ave # 44 San Antonio, TX 99999’ ->
‘99999|MAIN|123|unit|44’
We set the fuzzy match threshold higher for addresses with unit numbers. This
would be much simpler if we had a geocoding system in place. However, as it
turned out, in our cohort of 4220 patients, we identified 193 family groups,
and only resorted to address matching for 28 of them. 165 were matched on phone
numbers alone.
We also had some potential false positives for what appear to be multiple
families living together. In other words, contacts addresses matched exactly,
but no names or phone numbers matched. Again, for the purposes of our survey
and limiting responses to one per family, we are OK with this.
-Angela | UTHSCSA
From: [email protected]
[mailto:[email protected]] On Behalf Of Bos, Angela
Sent: Thursday, May 07, 2015 12:50 PM
To: Justin Dale; Dan Connolly
Cc: [email protected]; GPC Obesity Research Team
([email protected])
Subject: RE: maternal, sibling linkage for antibiotic obesity study?
Our preliminary checks showed we had little data in our Clarity patient
relationship link fields. But we have used a fuzzy matching algorithm on
patent contact info to identify siblings, which isn’t 100% accurate and could
use some improvement, so our answer was ‘yes’ with caveats. The basic algorithm
will be shared with gpc-dev soon.
- Angela | UTHSCSA
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Justin Dale
Sent: Thursday, May 07, 2015 12:33 PM
To: Dan Connolly
Cc: [email protected]<mailto:[email protected]>; GPC Obesity
Research Team
([email protected]<mailto:[email protected]>)
Subject: Re: maternal, sibling linkage for antibiotic obesity study?
From the Clarity data dictionary:
Table: PATIENT_2
Field: MOTHER_PAT_ID
Description: The unique ID of the system patient record belonging to the mother
of this patient. This item is populated if the motherís record is linked to the
patient record in enterprise registration system Registrationís emergency
contacts. This ID may be encrypted.
Table: PATIENT_2
Field: FATHER_PAT_ID
Description: The unique ID of the system patient record belonging to the father
of this patient. This item is populated if the fatherís record is linked to the
patient record in enterprise registration system Registrationís emergency
contacts. This ID may be encrypted.
Justin
On Thu, May 7, 2015 at 12:19 PM, Dan Connolly
<[email protected]<mailto:[email protected]>> wrote:
Do you know how we get these from CLARITY?
--
Dan
________________________________
From: Campbell, James R [[email protected]<mailto:[email protected]>]
Sent: Thursday, May 07, 2015 11:59 AM
To: Dan Connolly; [email protected]<mailto:[email protected]>;
GPC Obesity Research Team
([email protected]<mailto:[email protected]>)
Subject: RE: maternal, sibling linkage for antibiotic obesity study?
Simple links between record IDs of parents and family member could be
maintained as social history observations using clinical observation codes in
i2b2:
Identity of mother LOINC 74025-8
Identity of father LOINC 74026-6
Identity of family member LOINC 74024-1
LOINC also has a substantial number of observables relating to data about the
mother and child:
Maternal education 57712-2
Maternal pregnancies 75201-4
Maternal marital status 75257-6
James R. Campbell MD
[email protected]<mailto:[email protected]>
Office 402-559-7505<tel:402-559-7505>
Secretary 402-559-7299<tel:402-559-7299>
Fax 402-559-8396<tel:402-559-8396>
Pager 402-888-1230<tel:402-888-1230>
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]<mailto:[email protected]>]
On Behalf Of Dan Connolly
Sent: Thursday, May 07, 2015 8:55 AM
To: [email protected]<mailto:[email protected]>
Subject: maternal, sibling linkage for antibiotic obesity study?
Question 4 of the antibiotic survey
(#277<https://informatics.gpcnetwork.org/trac/Project/ticket/277>) asks about
maternal and sibling links. We have an issue in this area,
#103<https://informatics.gpcnetwork.org/trac/Project/ticket/103>, but we closed
it as wontfix. Did any of you answer "yes" to 4a/b/c? If so, we should perhaps
re-open #103.
--
Dan
The information in this e-mail may be privileged and confidential, intended
only for the use of the addressee(s) above. Any unauthorized use or disclosure
of this information is prohibited. If you have received this e-mail by mistake,
please delete it and immediately contact the sender.
_______________________________________________
Gpc-dev mailing list
[email protected]<mailto:[email protected]>
http://listserv.kumc.edu/mailman/listinfo/gpc-dev
_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev