Phil, Yes, you understood my question. Thank you.
I'm surprised by "Repeatability-XYZ encrypts to (perhaps) ABC consistently", but if it works as you describe, then I think the impact would not be huge. My observation is that compressing random ciphers is like compressing random floating-point numbers: keep poking that stick in your eye. I think this applies equally to things like transaction files, where account numbers often repeat for many records, which often are sorted or grouped. If the account number cypher is always the same, then it would get an entry in the dictionary. However, from what you explain, prefixes on account numbers, like the first four of a credit card number, would no longer be the same for groups of records, so that would mean some loss of repeatability. RON HAWKINS Director, Ipsicsopt Pty Ltd (ACN: 627 705 971) m+61 400029610| t: +1 4085625415 | f: +1 4087912585 -----Original Message----- From: IBM Mainframe Discussion List <[email protected]> On Behalf Of Phil Smith III Sent: Friday, 9 August 2019 01:40 To: [email protected] Subject: Re: [IBM-MAIN] Pervasive Encryption - why? Ron Hawkins wrote: >That would be an improvement over a random cypher, but wouldn't the >length >and repeatability of the data patterns after encryption negatively >affect >LZW compression, along with deduplication? Not sure I understand your question, but I'll try. Length-is unchanged Repeatability-XYZ encrypts to (perhaps) ABC consistently, so it's repeatable. But WXYZ does not encrypt to xABC, so is that what you mean about repeatability? Yes, that will affect compression to some extent. My suspicion is that it doesn't make a huge difference: yes, your database of names with ROB and ROBERT and ROBBIE won't compress the ROB part, but there will be some magic convergence of strings in the ciphertext that wasn't there before (less, but some). But my impression is that compression is the big win on larger fields anyway, like comment fields and the like. And you probably wouldn't FPE those because they're not structured by definition, so there's not much win there. We do occasionally have customers who want to encrypt, say, comment fields because "some of our reps put SSNs or PANs in those even though they aren't supposed to"; but for those, another AES encryption mode is a better choice anyway. Of course then you still lose some compression! Note also that in the case above (comment field with possible SSN/PAN) another choice is to FPE just the digits. So: Talked to John; he says his SSN is 123-45-6789, but file has 123-44-6789. Might encrypt to: Talked to John; he says his SSN is 761-64-3552, but file has 749-43-7477. If they also had "Will call him back on the 13th", the "13" would also get encrypted, of course. Kinda weird but it works. Did I answer the question at all, or am I off in far left field? ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
