I can foresee that my organization will soon need to provide test data to an 
external vendor.  This test data will need to be generated by masking subsets 
of real production data, since crafting fictional test data would be an 
impossible undertaking in the time we have available.



So all Personally Identifiable Information (PII) fields must be masked.  I have 
figured out techniques to mask names and addresses.  But I now need to figure 
out a technique to mask a nine digit numeric key.  This field is used as either 
a primary or secondary key in many files.  So I can't just substitute a random 
number, since the relationships need to be maintained.  I have identified some 
requirements for the masking algorithm:



(1) It must be deterministic (same input produces same output always).

(2) Uniqueness must be maintained.  Therefore no two original values can 
translate to the same masked value.

(3) The masked result must also be a nine digit numeric value.

(4) It must not be possible to calculate the original value from the masked 
value (i.e. a one-way transformation).



I can think of many ways to address the first three requirements.  But I am 
stuck on number (4).  The closest I can get to meeting this requirement is to 
assume that the masking algorithm itself is kept secret.  And I know that 
security thru obscurity is hardly a good plan.



Do any of the listers have an idea for such as masking algorithm?



John




----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

Reply via email to