[
https://issues.apache.org/jira/browse/NIFI-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy LoPresto updated NIFI-5582:
--------------------------------
Description:
There has been discussion on the mailing lists regarding the use of the
existing {{HashAttribute}} processor and the introduction of
{{CryptographicHashAttribute}}. The behavior of these processors does not
currently overlap, but {{CHA}} can be made to include {{HA}}'s functionality in
the documented use cases (to generate a unique identifier over a set of
specific attributes and values), if not its exact behavior.
*From discussion*
{quote}
Given your well-described use cases for HA, I think I may be able to provide
that in CHA as well. I would expect to add a dropdown PD for “attribute
enumeration style” and offer “individual” (each hash is generated on a single
attribute), “list” (each hash is generated over an ordered, delimited list of
literal matches), and “regex” (each hash is generated over an ordered list of
all attribute names matching the provided regex). Then the dynamic properties
would describe the output, as happens in the existing PR. Maybe a custom
delimiter property is needed too, but for now ‘’ could be used to join the
values. I’ll write up a Jira for this, and hopefully you can both let me know
if this meets your requirements.
Example:
*Incoming Flowfile*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”]
*CHA Properties (Individual)*
attribute_enumeration_style: “individual”
(dynamic) username_sha256: “username”
(dynamic) git_account_sha256: “git_account”
*Behavior (Individual)*
username_sha256 = git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (Individual)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, username_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23"]
*CHA Properties (List)*
attribute_enumeration_style: “list”
(dynamic) username_and_email_sha256: “username, email”
(dynamic) git_account_sha256: “git_account”
*Behavior (List)*
username_and_email_sha256 = $(echo -n "[email protected]" | shasum
-a 256) = 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (List)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, username_email_sha256: “
22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
*CHA Properties (Regex)*
attribute_enumeration_style: “regex”
(dynamic) all_sha256: “.*”
(dynamic) git_account_sha256: “git_account”
*Behavior (Regex)*
all_sha256 = sort(attributes_that_match_regex) = [email, git_account, role,
username] = $(echo -n "[email protected]" | shasum
-a 256) = b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (Regex)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, all_sha256: “
b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
{quote}
This will necessitate switching the "order" of dynamic properties in
{{CryptographicHashAttribute}} -- rather than a dynamic property
*existing_attribute_name* - {{new_attribute_name_containing_hash}}, the
ordering will be *new_attribute_name_containing_hash* -
{{existing_attribute_name}} to allow for other values like {{attribute_.*}} or
{{attribute_1, attribute_2}}.
There will also be a boolean flag to include the attribute name in the hashed
value:
Example:
*existing_attribute_name* - "some value"
If _true_, the value in *new_attribute_name_containing_hash* would be
{{hash("existing_attribute_namesome value")}}. If _false_, it would be
{{hash("some value")}}. As no one is using the new
{{CryptographicHashAttribute}} in the field yet, this change can only be made
now.
[Mailing list
discussion|https://lists.apache.org/thread.html/7defc9dbcb5e900a66bfc58b3e96f4860397dab0f0859c27f2e72061@%3Cusers.nifi.apache.org%3E]
was:
There has been discussion on the mailing lists regarding the use of the
existing {{HashAttribute}} processor and the introduction of
{{CryptographicHashAttribute}}. The behavior of these processors does not
currently overlap, but {{CHA}} can be made to include {{HA}}'s functionality in
the documented use cases (to generate a unique identifier over a set of
specific attributes and values), if not its exact behavior.
*From discussion*
{quote}
Given your well-described use cases for HA, I think I may be able to provide
that in CHA as well. I would expect to add a dropdown PD for “attribute
enumeration style” and offer “individual” (each hash is generated on a single
attribute), “list” (each hash is generated over an ordered, delimited list of
literal matches), and “regex” (each hash is generated over an ordered list of
all attribute names matching the provided regex). Then the dynamic properties
would describe the output, as happens in the existing PR. Maybe a custom
delimiter property is needed too, but for now ‘’ could be used to join the
values. I’ll write up a Jira for this, and hopefully you can both let me know
if this meets your requirements.
Example:
*Incoming Flowfile*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”]
*CHA Properties (Individual)*
attribute_enumeration_style: “individual”
(dynamic) username_sha256: “username”
(dynamic) git_account_sha256: “git_account”
*Behavior (Individual)*
username_sha256 = git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (Individual)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, username_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23"]
*CHA Properties (List)*
attribute_enumeration_style: “list”
(dynamic) username_and_email_sha256: “username, email”
(dynamic) git_account_sha256: “git_account”
*Behavior (List)*
username_and_email_sha256 = $(echo -n "[email protected]" | shasum
-a 256) = 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (List)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, username_email_sha256: “
22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
*CHA Properties (Regex)*
attribute_enumeration_style: “regex”
(dynamic) all_sha256: “.*”
(dynamic) git_account_sha256: “git_account”
*Behavior (Regex)*
all_sha256 = sort(attributes_that_match_regex) = [email, git_account, role,
username] = $(echo -n "[email protected]" | shasum
-a 256) = b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
*Resulting Flowfile (Regex)*
attributes: [username: “alopresto”, role: “security”, email:
“[email protected]”, git_account: “alopresto”, all_sha256: “
b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced”,
git_account_sha256:
“600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
{quote}
This will necessitate switching the "order" of dynamic properties in
{{CryptographicHashAttribute}} -- rather than a dynamic property
**existing_attribute_name** - {{new_attribute_name_containing_hash}}, the
ordering will be **new_attribute_name_containing_hash** -
{{existing_attribute_name}} to allow for other values like {{attribute_.*}} or
{{attribute_1, attribute_2}}.
There will also be a boolean flag to include the attribute name in the hashed
value:
Example:
*existing_attribute_name* - "some value"
If **true**, the value in *new_attribute_name_containing_hash* would be
{{hash("existing_attribute_namesome value")}}. If **false**, it would be
{{hash("some value")}}. As no one is using the new
{{CryptographicHashAttribute}} in the field yet, this change can only be made
now.
[Mailing list
discussion|https://lists.apache.org/thread.html/7defc9dbcb5e900a66bfc58b3e96f4860397dab0f0859c27f2e72061@%3Cusers.nifi.apache.org%3E]
> Integrate legacy behavior of HashAttribute into CryptographicHashAttribute
> --------------------------------------------------------------------------
>
> Key: NIFI-5582
> URL: https://issues.apache.org/jira/browse/NIFI-5582
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 1.7.1
> Reporter: Andy LoPresto
> Assignee: Andy LoPresto
> Priority: Major
> Labels: attribute, cryptography, hash
>
> There has been discussion on the mailing lists regarding the use of the
> existing {{HashAttribute}} processor and the introduction of
> {{CryptographicHashAttribute}}. The behavior of these processors does not
> currently overlap, but {{CHA}} can be made to include {{HA}}'s functionality
> in the documented use cases (to generate a unique identifier over a set of
> specific attributes and values), if not its exact behavior.
> *From discussion*
> {quote}
> Given your well-described use cases for HA, I think I may be able to provide
> that in CHA as well. I would expect to add a dropdown PD for “attribute
> enumeration style” and offer “individual” (each hash is generated on a single
> attribute), “list” (each hash is generated over an ordered, delimited list of
> literal matches), and “regex” (each hash is generated over an ordered list of
> all attribute names matching the provided regex). Then the dynamic properties
> would describe the output, as happens in the existing PR. Maybe a custom
> delimiter property is needed too, but for now ‘’ could be used to join the
> values. I’ll write up a Jira for this, and hopefully you can both let me know
> if this meets your requirements.
> Example:
> *Incoming Flowfile*
> attributes: [username: “alopresto”, role: “security”, email:
> “[email protected]”, git_account: “alopresto”]
> *CHA Properties (Individual)*
> attribute_enumeration_style: “individual”
> (dynamic) username_sha256: “username”
> (dynamic) git_account_sha256: “git_account”
> *Behavior (Individual)*
> username_sha256 = git_account_sha256 = $(echo -n "alopresto" | shasum -a 256)
> = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
> *Resulting Flowfile (Individual)*
>
> attributes: [username: “alopresto”, role: “security”, email:
> “[email protected]”, git_account: “alopresto”, username_sha256:
> “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”,
> git_account_sha256:
> “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23"]
> *CHA Properties (List)*
> attribute_enumeration_style: “list”
> (dynamic) username_and_email_sha256: “username, email”
> (dynamic) git_account_sha256: “git_account”
> *Behavior (List)*
> username_and_email_sha256 = $(echo -n "[email protected]" |
> shasum -a 256) =
> 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f
> git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
> 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
> *Resulting Flowfile (List)*
>
> attributes: [username: “alopresto”, role: “security”, email:
> “[email protected]”, git_account: “alopresto”, username_email_sha256: “
> 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f”,
> git_account_sha256:
> “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
> *CHA Properties (Regex)*
> attribute_enumeration_style: “regex”
> (dynamic) all_sha256: “.*”
> (dynamic) git_account_sha256: “git_account”
> *Behavior (Regex)*
> all_sha256 = sort(attributes_that_match_regex) = [email, git_account, role,
> username] = $(echo -n "[email protected]" |
> shasum -a 256) =
> b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced
> git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) =
> 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
> *Resulting Flowfile (Regex)*
>
> attributes: [username: “alopresto”, role: “security”, email:
> “[email protected]”, git_account: “alopresto”, all_sha256: “
> b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced”,
> git_account_sha256:
> “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
> {quote}
> This will necessitate switching the "order" of dynamic properties in
> {{CryptographicHashAttribute}} -- rather than a dynamic property
> *existing_attribute_name* - {{new_attribute_name_containing_hash}}, the
> ordering will be *new_attribute_name_containing_hash* -
> {{existing_attribute_name}} to allow for other values like {{attribute_.*}}
> or {{attribute_1, attribute_2}}.
> There will also be a boolean flag to include the attribute name in the hashed
> value:
> Example:
> *existing_attribute_name* - "some value"
> If _true_, the value in *new_attribute_name_containing_hash* would be
> {{hash("existing_attribute_namesome value")}}. If _false_, it would be
> {{hash("some value")}}. As no one is using the new
> {{CryptographicHashAttribute}} in the field yet, this change can only be made
> now.
> [Mailing list
> discussion|https://lists.apache.org/thread.html/7defc9dbcb5e900a66bfc58b3e96f4860397dab0f0859c27f2e72061@%3Cusers.nifi.apache.org%3E]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)