managing PHI risk in lab result text Re: CDM 6.0 review responses from MCW

Dan Connolly Tue, 29 Sep 2020 06:19:22 -0700

FWIW,

KUMC manages this with a kludge of whitelists an regexps... Though I guess I've 
seen worse published as "natural language processing" methods, I agree that 
this isn't the sort of thing we should be doing for the PCORNet CDM.


https://github.com/kumc-bmi/heron/blob/heron-deercreek/heron_load/curated_data/componentids_whitelist.csv
https://github.com/kumc-bmi/heron/blob/heron-deercreek/heron_load/curated_data/lab_regex.csv


--
Dan

________________________________
From: Gpc-dev <gpc-dev-boun...@listserv.kumc.edu> on behalf of Susan Rea 
<susan....@imail.org>
Sent: Monday, September 28, 2020 7:17 PM
To: Manuel, Laura S M <manue...@uthscsa.edu>; Stoddard, Alexander 
<astodd...@mcw.edu>; gpc-dev@listserv.kumc.edu <gpc-dev@listserv.kumc.edu>
Subject: RE: CDM 6.0 review responses from MCW

Thank you, Laura and Alex, for reviewing the changes.  I have a few comments on 
your comments, added inline below, marked with *SR*.  We put our local team 
together to answer their Survey a few weeks ago so had an earlier chance for 
input.

Thanks,
Susan Rea

-----Original Message-----
From: Gpc-dev <gpc-dev-boun...@listserv.kumc.edu> On Behalf Of Manuel, Laura S M
Sent: Friday, September 25, 2020 11:15 AM
To: Stoddard, Alexander <astodd...@mcw.edu>; gpc-dev@listserv.kumc.edu
Subject: RE: CDM 6.0 review responses from MCW

BE ALERT. External Sender. Be cautious.

Thanks Alex for diving into this first and stating things so eloquently.
I agree with everything Alex put and would add:

>If we move the records from VITAL to OBS_CLIN, we need to merge the
>valuesets for the provenance fields. If we do that, OBSCLIN_SOURCE would 
>contain OD (Order/EHR), RG (Registry/ancillary system) and HC (Healthcare 
>delivery setting).
>There is a fair amount of overlap between these terms.  We are proposing to 
>deprecate OD and RG and utilize HC instead (we will make the same change to 
>OBSGEN_SOURCE as well).
>Any concerns with this change?

Registries normally contain chart abstracted data which can be useful, but also 
adds an additional step for human error. I believe it would be useful to keep 
the distinction between potentially interpreted data and raw data from the EHR.

>Addition of Result_text
This value would likely be a free text field and this may allow PHI values to 
slip through. We would not recommend the addition of a free text field in a 
limited data set.

*SR* I agree we may have privacy issues populating free text, where we do find 
providers may use any convenient place to put a little note.  So, we would have 
to carefully curate whatever data were requested. It would be helpful if DRNOC 
would identify specific lab tests that are needed or anticipated and narrow the 
curation task to what may be useful lab data.  The "everything you got" 
strategy would really be difficult for sharing text results.  Also, I like 
Alex's comment about bloating the table and his solution.  /*SR*

>Addition of Raw Condition Text
This value is a free text field at one of our institutions and a value set at 
another. We could use the value set, but would not be able to add a free text 
string to a limited data set.

*SR* We have the patient reported reason for visit as free text and appears to 
be literal short version of what patient tells clerk when making appointment or 
what they or family told admitting clerk.  We also have coding specialists' 
Admitting Dx ICD code for hospital visits.  We would also be suspicious for PHI 
in this text.  Unsure of the value proposition for this versus intake nursing 
notes, if hospital or ED encounter. Patient may not be reliable reporter of 
symptoms if they are acutely ill at admission, or making an appointment for a 
sensitive problem. /*SR*

-----Original Message-----
From: Stoddard, Alexander <astodd...@mcw.edu>
Sent: Thursday, September 24, 2020 10:02 PM
To: gpc-dev@listserv.kumc.edu
Cc: Taylor, Bradley <btay...@mcw.edu>; rwait...@kumc.edu; Manuel, Laura S M 
<manue...@uthscsa.edu>
Subject: CDM 6.0 review responses from MCW

Hello GPC-DEV,

MCW agreed to review the CDM 6.0 spec during the dev call 2020-09-22. The 
replies to DRNOC, using an excel file template (available at 
https://urldefense.proofpoint.com/v2/url?u=https-3A__pcornet.imeetcentral.com_drnoc-2Dworkgroups_folder_WzIwLDEzMTI2ODA5XQ_&d=DwIFAw&c=II16XUCNF0uj2WHDMBdftpHZzyfqZU4E6o4J8m7Yfh-XF5deecOtjPXuMFvj1uWy&r=MwmdyHUR1MNPWZBi1oQ_Ksh4XI39nGu45nleZO875iA&m=ZgEr_8KiuJ9caTG5rYIXlYWcbHCYl2xRU1V7DOuK2Ok&s=3_hAUe-wl_W9YElfhi1wlFIpXD6IV9AyrIpsadNQMQY&e=
 ) , have been requested by end of day Friday 2020-09-25.

Below are a text version of the responses that I will be sending on behalf of 
MCW.


Main questions seeking feedback
-------------------------------------

>As the CDM has grown in size, the image included in the specification (Page 9) 
> conveys less and less information.
>Any concerns if it is deleted?
Not a concern, but a highlighted list of changed tables/new columns on a single 
page is useful

>Suggestions on what we might consider as a replacement?
A machine readable, diff-able and version controlled schema definition would be 
very useful. Potentially this would allow tool assisted SQL generation for the 
different RDMS, or even visualization generation. A candidate for such a schema 
definition format would be that used by sql-alchemy python package: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_13_core_metadata.html&d=DwIFAw&c=II16XUCNF0uj2WHDMBdftpHZzyfqZU4E6o4J8m7Yfh-XF5deecOtjPXuMFvj1uWy&r=MwmdyHUR1MNPWZBi1oQ_Ksh4XI39nGu45nleZO875iA&m=ZgEr_8KiuJ9caTG5rYIXlYWcbHCYl2xRU1V7DOuK2Ok&s=o_ogbev5sUlo1LGRfUUSH3IayPshWLSZQ5TjU2TyMfg&e=

>Any there any concerns about the strategy to deprecate VITAL and move the 
>records to OBS_CLIN?
 OBS_CLIN is a much better data model for vitals but transitioning distinct 
columns in the VITALs table to a single column requiring different value-sets 
for different qualitative variables will be easier with a more agile and open 
process for value-set definition during the transition.  Open appending of 
additional values to a version controlled value set reference would offer 
projects much greater flexibility to adopt additional tests and observations 
throughout the CDM lifecycle without any loss of specificity, accuracy or 
backwards compatibility. This is especially true of _QUAL columns that will 
hold values for many different results/observations unlike domain specific 
columns historically defined using the current process (e.g. RACE in the 
DEMOGRAPHIC table and SMOKING in the VITALS table)

In general qualitative value-sets should be defined on the codes used to 
specify given observation rows, not the whole _QUAL column.

>If we move the records from VITAL to OBS_CLIN, we need to merge the
>valuesets for the provenance fields. If we do that, OBSCLIN_SOURCE would 
>contain OD (Order/EHR), RG (Registry/ancillary system) and HC (Healthcare 
>delivery setting).
>There is a fair amount of overlap between these terms.  We are proposing to 
>deprecate OD and RG and utilize HC instead (we will make the same change to 
>OBSGEN_SOURCE as well).
>Any concerns with this change?
EHR vs Registry seems like a valid source distinction. From experience the 
source fields are most often useful for data tracing in QC operations on 
individual records, rather than research and aggregation of the data. A richer 
value-set may therefore be of benefit to sites.

>Is the description for Telehealth encounters sufficient, or is more detail 
>needed?
Description is sufficient but the real issue is likely the specificity with 
which these encounters (vs routine telephone or other electronic 
communications) are recorded in the source systems of sites.

>If we remove the VALUESET and VALUESET DESCRIPTOR columns from the
>FIELDS tab of the parseable file, would that pose a problem? (The
>VALUESETS tab would remain unchanged)
No problem. The data in these columns is much more easily used as represented 
in the VALUESETS tab. A flag or categorical value to indicate a field uses a 
valueset on the VALUESETS tab would be useful.

General Comments
---------------------
None

Value Sets
-----------
See comments on the VITALS transition.

LAB_HISTORY table
---------------------
No particular issues with the schema definition. But MCW remains very dubious 
of the utility or accuracy possible with this table versus a centrally held one 
maintained by DRNOC.

If a lab test is stable enough and well defined enough for population reference 
ranges (but doesn't have individual test normal ranges defined for a particular 
source) then a centrally maintained reference fallback is reasonable.

When an assay does not have generalizable normal ranges, e.g. when run relative 
to a variable arbitrary reference and/or varying from machine to machine, then 
you really need a per record reference for the normal range and this table will 
be insufficiently granular and misleading.

The spec reads 'Every record in this table should be unique.' but this is 
trivially true given each row has an arbitrary LABHISTORYID and uniqueness is 
otherwise undefined.

New / Modified fields
------------------------
LAB_RESULT_CMRESULT_TEXT         - Implementation concern - in MCW's experience 
SAS expands varchar columns to their maximum width,
this will bloat table size if a column is sparsely populated with large 
records. Much more efficient would be a separate relational table with text 
results keyed by LAB_RESULT_CM_ID
ENCOUNTERENCOUNTER_TYPE         - No comment
ENCOUNTERADMITTING_SOURCE     - No comment
CONDITIONCONDITION_SOURCE     - Guidance on expected source of Chief Complaint 
would be useful, should it always be linked to an ENCOUNTER?
CONDITIONRAW_CONDITION_TEXT  - No comment
OBS_CLINOBSCLIN_START_DATE    - No comment
OBS_CLINOBSCLIN_START_TIME    - No comment
OBS_CLINOBSCLIN_STOP_DATE     - No comment
OBS_CLINOBSCLIN_STOP_TIME     - No comment
OBS_CLINOBSCLIN_SOURCE         - May be better to maintain EHR / Registry 
source distinction
OBS_CLINOBSCLIN_ABN_IND        - No comment
OBS_GENOBSGEN_START_DATE    - No comment
OBS_GENOBSGEN_START_TIME    - No comment
OBS_GENOBSGEN_STOP_DATE     - No comment
OBS_GENOBSGEN_STOP_TIME     -  No comment
OBS_GENOBSGEN_SOURCE          - May be better to maintain EHR / Registry source 
distinction
OBS_GENOBSGEN_ABN_IND         - No comment
OBS_GENOBSGEN_TABLE_MODIFIED -  No comment
HARVESTCDM_VERSION               - No comment
HARVESTTOKEN_ENCRYPTION_KEY - Is a better name TOKEN_ENCRYPTION_KEY_NAME ? - 
Please give an example in guidance
HARVESTOBSCLIN_START_DATE_MGMT - No comment
HARVESTOBSCLIN_STOP_DATE_MGMT   - No comment
HARVESTOBSGEN_START_DATE_MGMT  - No comment
HARVESTOBSGEN_STOP_DATE_MGMT   - No comment

Best regards,
Alex Stoddard

Programmer/Analyst Biomedical Informatics Clinical & Translational Science 
Institute Medical College of Wisconsin astodd...@mcw.edu

I am currently working remotely
--------------------------------------------------


_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
https://urldefense.proofpoint.com/v2/url?u=http-3A__listserv.kumc.edu_mailman_listinfo_gpc-2Ddev&d=DwIFAw&c=II16XUCNF0uj2WHDMBdftpHZzyfqZU4E6o4J8m7Yfh-XF5deecOtjPXuMFvj1uWy&r=MwmdyHUR1MNPWZBi1oQ_Ksh4XI39nGu45nleZO875iA&m=ZgEr_8KiuJ9caTG5rYIXlYWcbHCYl2xRU1V7DOuK2Ok&s=sqDS7rNOmnY0GFXD95Vi3s8zTcYJA-GnOYQhJsxokUU&e=

NOTICE: This e-mail is for the sole use of the intended recipient and may 
contain confidential and privileged information. If you are not the intended 
recipient, you are prohibited from reviewing, using, disclosing or distributing 
this e-mail or its contents. If you have received this e-mail in error, please 
contact the sender by reply e-mail and destroy all copies of this e-mail and 
its contents.
_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

managing PHI risk in lab result text Re: CDM 6.0 review responses from MCW

Reply via email to