There's a lot of work in this area using deep learning, recurring neural 
network techniques.

https://www.ncbi.nlm.nih.gov/books/NBK285994/ discusses some policy and other 
approaches from a U.S. HIPAA perspective.

John


John P. Rees
Archivist and Digital Resources Manager
History of Medicine Division
National Library of Medicine
301-827-4510



-----Original Message-----
From: Kyle Banerjee [mailto:[email protected]] 
Sent: Friday, May 11, 2018 7:17 PM
To: [email protected]
Subject: [CODE4LIB] Best way to partially anonymize data?

Howdy all,

We need to share large datasets containing medical imagery without revealing 
PHI. The images themselves don't present a problem due to their nature but the 
embedded metadata does.

What approaches might work ?

Our first reaction was to encrypt problematic fields, embed a public key for 
each item in the metadata, and have that dataset owner hold a separate private 
key for each image that allows authorized users to decrypt fields.
Keys would be transmitted via the same secure channels that would normally be 
used for for authorized PHI.

There's an obvious key management problem (any ideas for this -- central store 
would counteract the benefits the keys offer), but I'm not sure if we really 
have to worry about that. Significant key loss would be expected but since that 
data disseminated is only a copy, a new dataset with new keys could be created 
from the original if keys were lost or known to be compromised.

This approach has a number of flaws, but we're thinking it may be a practical 
way to achieve the effect needed without compromising private data.

Any ideas would be appreciated. Thanks,

kyle

Reply via email to