I can foresee that my organization will soon need to provide test data to an external vendor. This test data will need to be generated by masking subsets of real production data, since crafting fictional test data would be an impossible undertaking in the time we have available.
So all Personally Identifiable Information (PII) fields must be masked. I have figured out techniques to mask names and addresses. But I now need to figure out a technique to mask a nine digit numeric key. This field is used as either a primary or secondary key in many files. So I can't just substitute a random number, since the relationships need to be maintained. I have identified some requirements for the masking algorithm: (1) It must be deterministic (same input produces same output always). (2) Uniqueness must be maintained. Therefore no two original values can translate to the same masked value. (3) The masked result must also be a nine digit numeric value. (4) It must not be possible to calculate the original value from the masked value (i.e. a one-way transformation). I can think of many ways to address the first three requirements. But I am stuck on number (4). The closest I can get to meeting this requirement is to assume that the masking algorithm itself is kept secret. And I know that security thru obscurity is hardly a good plan. Do any of the listers have an idea for such as masking algorithm? John ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN