Andy LoPresto created NIFI-5582:
-----------------------------------
Summary: Integrate legacy behavior of HashAttribute into
CryptographicHashAttribute
Key: NIFI-5582
URL: https://issues.apache.org/jira/browse/NIFI-5582
Project: Apache NiFi
Issue Type: Improvement
Components: Extensions
Affects Versions: 1.7.1
Reporter: Andy LoPresto
Assignee: Andy LoPresto
There has been discussion on the mailing lists regarding the use of the
existing {{HashAttribute}} processor and the introduction of
{{CryptographicHashAttribute}}. The behavior of these processors does not
currently overlap, but {{CHA}} can be made to include {{HA}}'s functionality in
the documented use cases (to generate a unique identifier over a set of
specific attributes and values), if not its exact behavior.
*From discussion*
{quote}
Given your well-described use cases for HA, I think I may be able to provide
that in CHA as well. I would expect to add a dropdown PD for “attribute
enumeration style” and offer “individual” (each hash is generated on a single
attribute), “list” (each hash is generated over an ordered, delimited list of
literal matches), and “regex” (each hash is generated over an ordered list of
all attribute names matching the provided regex). Then the dynamic properties
would describe the output, as happens in the existing PR. Maybe a custom
delimiter property is needed too, but for now ‘’ could be used to join the
values. I’ll write up a Jira for this, and hopefully you can both let me know
if this meets your requirements.
Example:
*Incoming Flowfile*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”]
*CHA Properties (Individual)*
attribute_enumeration_style: “individual”
(dynamic) username_sha256: “username”
(dynamic) git_account_sha256: “git_account”
*Behavior (Individual)*
username_sha256 = git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (Individual)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, username_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23"]
*CHA Properties (List)*
attribute_enumeration_style: “list”
(dynamic) username_and_email_sha256: “username, email”
(dynamic) git_account_sha256: “git_account”
*Behavior (List)*
username_and_email_sha256 = $(echo -n "[email protected]" | shasum
-a 256) = 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (List)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, username_email_sha256: “
22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
*CHA Properties (Regex)*
attribute_enumeration_style: “regex”
(dynamic) all_sha256: “.*”
(dynamic) git_account_sha256: “git_account”
*Behavior (Regex)*
all_sha256 = sort(attributes_that_match_regex) = [email, git_account, role,
username] = $(echo -n "[email protected]" | shasum
-a 256) = b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (Regex)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, all_sha256: “
b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
{quote}
This will necessitate switching the "order" of dynamic properties in
{{CryptographicHashAttribute}} -- rather than a dynamic property
**existing_attribute_name** - {{new_attribute_name_containing_hash}}, the
ordering will be **new_attribute_name_containing_hash** -
{{existing_attribute_name}} to allow for other values like {{attribute_.*}} or
{{attribute_1, attribute_2}}.
There will also be a boolean flag to include the attribute name in the hashed
value:
Example:
*existing_attribute_name* - "some value"
If **true**, the value in *new_attribute_name_containing_hash* would be
{{hash("existing_attribute_namesome value")}}. If **false**, it would be
{{hash("some value")}}. As no one is using the new
{{CryptographicHashAttribute}} in the field yet, this change can only be made
now.
[Mailing list
discussion|https://lists.apache.org/thread.html/7defc9dbcb5e900a66bfc58b3e96f4860397dab0f0859c27f2e72061@%3Cusers.nifi.apache.org%3E]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)