For the purposes of the pediatric obesity survey, we wanted to identify 
siblings to prevent sending multiple surveys to the same family, and having 
some false positives is OK, especially for such a large cohort. The best way of 
finding them seems to be matching by the parent/guardian phone numbers. We have 
many siblings who, for various reasons, do not share a contact name or address. 
Here’s the process with a little bit of background on how we get 
parent/guardian contact info.

We use the following order of precedence for contact names, addresses, and 
phone numbers, taking the first one where all address fields pass minimal 
validation checks:
clarity.account (billing), clarity.emergency_contacts  (mother > father > 
guardian > emerg1 > emerg2).

If the contact full name matches the patient full name, we assume this in an 
independent young adult, and may skip them accordingly. If the preferred 
address has no name, we send letters to Mr. or Mrs. Patient_Last_Name. If none 
of those preferred addresses are available, we use patient address and send 
letter to Mr. or Mrs. Patient_Last_Name.

To find siblings, first we compare phone valid phone numbers for matches, then 
zip codes, then we do a fuzzy match (using Levenshtein distance) on addresses 
that are formatted for comparison as follows:

- ignore city and state (using zip code only)
- concatenate street line1+line2 and upper case it
- remove periods and commas
- abbreviate directional words per USPS rules
- abbreviate street suffixes, PO boxes, and rural route types per USPS rules
- remove commonly omitted street suffixes (AVE, DR, LN, RD, ST)
- apartment, #, lot, or unit designations are formatted similarly (e.g. 
‘unit|123’)
- tokenize the string and list in the following order
   - ZIP|BOX[|RT NUM][RT TYPE]
      - e.g. PO Box 99 RR 2 Poteet, Tx 99999 -> ‘99999|99|RR|9’
   -  ZIP|STREET NAME|STREET NUM[|UNIT|123]
      - e.g. ‘123 Main Ave # 44 San Antonio, TX 99999’ -> 
‘99999|MAIN|123|unit|44’

We set the fuzzy match threshold higher for addresses with unit numbers. This 
would be much simpler if we had a geocoding system in place. However, as it 
turned out, in our cohort of 4220 patients, we identified 193 family groups, 
and only resorted to address matching for 28 of them. 165 were matched on phone 
numbers alone.

We also had some potential false positives for what appear to be multiple 
families living together. In other words, contacts addresses matched exactly, 
but no names or phone numbers matched. Again, for the purposes of our survey 
and limiting responses to one per family, we are OK with this.

-Angela | UTHSCSA


From: [email protected] 
[mailto:[email protected]] On Behalf Of Bos, Angela
Sent: Thursday, May 07, 2015 12:50 PM
To: Justin Dale; Dan Connolly
Cc: [email protected]; GPC Obesity Research Team 
([email protected])
Subject: RE: maternal, sibling linkage for antibiotic obesity study?

Our preliminary checks showed we had little data in our Clarity patient 
relationship link fields.  But we have used a fuzzy matching algorithm on 
patent contact info to identify siblings, which isn’t 100% accurate and could 
use some improvement, so our answer was ‘yes’ with caveats. The basic algorithm 
will be shared with gpc-dev soon.

- Angela | UTHSCSA


From: 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Justin Dale
Sent: Thursday, May 07, 2015 12:33 PM
To: Dan Connolly
Cc: [email protected]<mailto:[email protected]>; GPC Obesity 
Research Team 
([email protected]<mailto:[email protected]>)
Subject: Re: maternal, sibling linkage for antibiotic obesity study?

From the Clarity data dictionary:

Table: PATIENT_2
Field: MOTHER_PAT_ID
Description: The unique ID of the system patient record belonging to the mother 
of this patient. This item is populated if the motherís record is linked to the 
patient record in enterprise registration system Registrationís emergency 
contacts. This ID may be encrypted.

Table: PATIENT_2
Field: FATHER_PAT_ID
Description: The unique ID of the system patient record belonging to the father 
of this patient. This item is populated if the fatherís record is linked to the 
patient record in enterprise registration system Registrationís emergency 
contacts. This ID may be encrypted.



Justin

On Thu, May 7, 2015 at 12:19 PM, Dan Connolly 
<[email protected]<mailto:[email protected]>> wrote:
Do you know how we get these from CLARITY?

--
Dan
________________________________
From: Campbell, James R [[email protected]<mailto:[email protected]>]
Sent: Thursday, May 07, 2015 11:59 AM
To: Dan Connolly; [email protected]<mailto:[email protected]>; 
GPC Obesity Research Team 
([email protected]<mailto:[email protected]>)
Subject: RE: maternal, sibling linkage for antibiotic obesity study?
Simple links between record IDs of parents and family member could be 
maintained as social history observations using clinical observation codes in 
i2b2:

Identity of mother LOINC 74025-8
Identity of father LOINC 74026-6
Identity of family member  LOINC 74024-1

LOINC also has a substantial number of observables relating to data about the 
mother and child:
Maternal education  57712-2
Maternal pregnancies  75201-4
Maternal marital status 75257-6


James R. Campbell MD
[email protected]<mailto:[email protected]>
Office 402-559-7505<tel:402-559-7505>
Secretary 402-559-7299<tel:402-559-7299>
Fax 402-559-8396<tel:402-559-8396>
Pager 402-888-1230<tel:402-888-1230>

From: 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Dan Connolly
Sent: Thursday, May 07, 2015 8:55 AM
To: [email protected]<mailto:[email protected]>
Subject: maternal, sibling linkage for antibiotic obesity study?

Question 4 of the antibiotic survey 
(#277<https://informatics.gpcnetwork.org/trac/Project/ticket/277>) asks about 
maternal and sibling links. We have an issue in this area, 
#103<https://informatics.gpcnetwork.org/trac/Project/ticket/103>, but we closed 
it as wontfix. Did any of you answer "yes" to 4a/b/c? If so, we should perhaps 
re-open #103.

--
Dan

The information in this e-mail may be privileged and confidential, intended 
only for the use of the addressee(s) above. Any unauthorized use or disclosure 
of this information is prohibited. If you have received this e-mail by mistake, 
please delete it and immediately contact the sender.

_______________________________________________
Gpc-dev mailing list
[email protected]<mailto:[email protected]>
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to