Vyshali, You may be interested in format preserving encryption (FPE) [1] if you need to maintain format while performing data masking. There are also methods to derive a cryptographically secure hash function from encryption [2] so that you can have “one way” data transformation and maintain a given format.
I would encourage you to be aware of all attack surfaces here, though. First, there are many examples of anonymization being easily undone because it was not correctly implemented [3], used a weak process [4], or could be reconstructed through associated data [5]. Even with a strong anonymization approach, remember that NiFi tracks the data lineage throughout the process, so a user with sufficient permissions will be able to look at the provenance for a flowfile before/after it has undergone the anonymization operation and see the original data. This can be partially mitigated and restricted to a core group of privileged users via strict access control policies. On top of that, the provenance repository does provide an encrypted implementation, but the content and flowfile repositories currently do not. A malicious user with OS-level access could examine the repository files on disk to extract the original content or flowfile attributes before they were anonymized. There are open Jiras [6][7] for those efforts. There is also the issue of a user examining the flowfile via queue listing. Open Jiras for encrypting attributes [8] and hashing attributes [9], as well as “sensitive attributes” with per-key-permissions also exist [10]. I hope this helps to illustrate the complexities of anonymization and leads you to a successful solution. [1] https://en.wikipedia.org/wiki/Format-preserving_encryption <https://en.wikipedia.org/wiki/Format-preserving_encryption> [2] https://crypto.stackexchange.com/questions/24284/is-there-a-format-preserving-cryptographically-secure-hash <https://crypto.stackexchange.com/questions/24284/is-there-a-format-preserving-cryptographically-secure-hash> [3] https://dataprivacylab.org/dataprivacy/projects/linkage/lidap-wp19.pdf <https://dataprivacylab.org/dataprivacy/projects/linkage/lidap-wp19.pdf> [4] https://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/ <https://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/> [5] https://hbr.org/2015/02/theres-no-such-thing-as-anonymous-data <https://hbr.org/2015/02/theres-no-such-thing-as-anonymous-data> [6] https://issues.apache.org/jira/browse/NIFI-3834 [7] https://issues.apache.org/jira/browse/NIFI-3833 [8] https://issues.apache.org/jira/browse/NIFI-2961 [9] https://issues.apache.org/jira/browse/NIFI-1885 [10] https://issues.apache.org/jira/browse/NIFI-1140 Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Oct 17, 2017, at 10:36 AM, Mike Thomsen <[email protected]> wrote: > > Not if you use hashing. You'll get a field value like this (sha1 > algorithm): c3499c2729730a7f807efb8676a92dcb6f8a3f8f > > For getting closer to the original data in the sort of values present, > you'll need to try something like ARX. > > On Tue, Oct 17, 2017 at 11:53 AM, Vyshali <[email protected]> wrote: > >> Hi Chris, >> >> Hashing using executescript processor means that I should write some coding >> logic to do that.If so,will the format of the field will remain the same ? >> >> Please explain me with examples. >> >> Regards, >> Vyshali >> >> >> >> -- >> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/ >>
signature.asc
Description: Message signed with OpenPGP using GPGMail
