Andy LoPresto created NIFI-5582:
-----------------------------------

             Summary: Integrate legacy behavior of HashAttribute into 
CryptographicHashAttribute
                 Key: NIFI-5582
                 URL: https://issues.apache.org/jira/browse/NIFI-5582
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Extensions
    Affects Versions: 1.7.1
            Reporter: Andy LoPresto
            Assignee: Andy LoPresto


There has been discussion on the mailing lists regarding the use of the 
existing {{HashAttribute}} processor and the introduction of 
{{CryptographicHashAttribute}}. The behavior of these processors does not 
currently overlap, but {{CHA}} can be made to include {{HA}}'s functionality in 
the documented use cases (to generate a unique identifier over a set of 
specific attributes and values), if not its exact behavior. 

*From discussion*

{quote}
Given your well-described use cases for HA, I think I may be able to provide 
that in CHA as well. I would expect to add a dropdown PD for “attribute 
enumeration style” and offer “individual” (each hash is generated on a single 
attribute), “list” (each hash is generated over an ordered, delimited list of 
literal matches), and “regex” (each hash is generated over an ordered list of 
all attribute names matching the provided regex). Then the dynamic properties 
would describe the output, as happens in the existing PR. Maybe a custom 
delimiter property is needed too, but for now ‘’ could be used to join the 
values. I’ll write up a Jira for this, and hopefully you can both let me know 
if this meets your requirements. 

Example:

*Incoming Flowfile*

attributes: [username: “alopresto”, role: “security”, email: 
“[email protected]”, git_account: “alopresto”]

*CHA Properties (Individual)*

attribute_enumeration_style: “individual”
(dynamic) username_sha256: “username”
(dynamic) git_account_sha256: “git_account”

*Behavior (Individual)*

username_sha256 = git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23

*Resulting Flowfile (Individual)*
 
attributes: [username: “alopresto”, role: “security”, email: 
“[email protected]”, git_account: “alopresto”, username_sha256: 
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”, 
git_account_sha256: 
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23"]

*CHA Properties (List)*

attribute_enumeration_style: “list”
(dynamic) username_and_email_sha256: “username, email”
(dynamic) git_account_sha256: “git_account”

*Behavior (List)*

username_and_email_sha256 = $(echo -n "[email protected]" | shasum 
-a 256) = 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23

*Resulting Flowfile (List)*
 
attributes: [username: “alopresto”, role: “security”, email: 
“[email protected]”, git_account: “alopresto”, username_email_sha256: “ 
22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f”, 
git_account_sha256: 
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]

*CHA Properties (Regex)*

attribute_enumeration_style: “regex”
(dynamic) all_sha256: “.*”
(dynamic) git_account_sha256: “git_account”

*Behavior (Regex)*

all_sha256 = sort(attributes_that_match_regex) = [email, git_account, role, 
username] = $(echo -n "[email protected]" | shasum 
-a 256) = b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23

*Resulting Flowfile (Regex)*
 
attributes: [username: “alopresto”, role: “security”, email: 
“[email protected]”, git_account: “alopresto”, all_sha256: “ 
b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced”, 
git_account_sha256: 
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
{quote}

This will necessitate switching the "order" of dynamic properties in 
{{CryptographicHashAttribute}} -- rather than a dynamic property 
**existing_attribute_name** - {{new_attribute_name_containing_hash}}, the 
ordering will be **new_attribute_name_containing_hash** - 
{{existing_attribute_name}} to allow for other values like {{attribute_.*}} or 
{{attribute_1, attribute_2}}. 
There will also be a boolean flag to include the attribute name in the hashed 
value:

Example: 

*existing_attribute_name* - "some value"

If **true**, the value in *new_attribute_name_containing_hash* would be 
{{hash("existing_attribute_namesome value")}}. If **false**, it would be 
{{hash("some value")}}. As no one is using the new 
{{CryptographicHashAttribute}} in the field yet, this change can only be made 
now. 

[Mailing list 
discussion|https://lists.apache.org/thread.html/7defc9dbcb5e900a66bfc58b3e96f4860397dab0f0859c27f2e72061@%3Cusers.nifi.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to