Re: [CODE4LIB] Best way to partially anonymize data?

Becky Y Fri, 11 May 2018 17:22:41 -0700

Hi Kyle,

I’m curious if you can share more information about the storage, access, and 
intended use of the datasets. Depending on the answers, you might look at some 
of the following:


- creating two datasets: one with PHI scrubbed for wider distribution, and one 
with PHI intact that is restricted to authorized users
- if the use case includes tracking individuals throughout a length of time, 
then you might need to look at implementing some pseudonymous data methods
- scrubbing some PHI and de-identifying other PHI data, depending on use cases 
for the data set

More options will follow with more information :c) Encrypting the data does not 
anonymize it, but it does control access to it, so if you must keep the raw 
metadata, then access controls would be a key part in handling PHI in your 
dataset.

Since you are dealing with PHI, you probably already are aware to keep within 
the rules and regulations set by HIPPA and HITECH. One more thing to keep in 
mind is that any state laws that go above and beyond HIPPA apply - see the 
California Medical Information Privacy Act as an example of such a state law. 
If you have access to a legal department or something similar, they would be 
better able to guide you in these matters.

Thanks,
Becky

Sent from somewhere.


> On May 11, 2018, at 4:17 PM, Kyle Banerjee <[email protected]> wrote:
> 
> Howdy all,
> 
> We need to share large datasets containing medical imagery without
> revealing PHI. The images themselves don't present a problem due to their
> nature but the embedded metadata does.
> 
> What approaches might work ?
> 
> Our first reaction was to encrypt problematic fields, embed a public key
> for each item in the metadata, and have that dataset owner hold a separate
> private key for each image that allows authorized users to decrypt fields.
> Keys would be transmitted via the same secure channels that would normally
> be used for for authorized PHI.
> 
> There's an obvious key management problem (any ideas for this -- central
> store would counteract the benefits the keys offer), but I'm not sure if we
> really have to worry about that. Significant key loss would be expected but
> since that data disseminated is only a copy, a new dataset with new keys
> could be created from the original if keys were lost or known to be
> compromised.
> 
> This approach has a number of flaws, but we're thinking it may be a
> practical way to achieve the effect needed without compromising private
> data.
> 
> Any ideas would be appreciated. Thanks,
> 
> kyle

Re: [CODE4LIB] Best way to partially anonymize data?

Reply via email to